Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
- URL: http://arxiv.org/abs/2201.04234v1
- Date: Tue, 11 Jan 2022 23:01:12 GMT
- Title: Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
- Authors: Saurabh Garg, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam
Neyshabur, Hanie Sedghi
- Abstract summary: Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
- Score: 63.740181251997306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world machine learning deployments are characterized by mismatches
between the source (training) and target (test) distributions that may cause
performance drops. In this work, we investigate methods for predicting the
target domain accuracy using only labeled source data and unlabeled target
data. We propose Average Thresholded Confidence (ATC), a practical method that
learns a threshold on the model's confidence, predicting accuracy as the
fraction of unlabeled examples for which model confidence exceeds that
threshold. ATC outperforms previous methods across several model architectures,
types of distribution shifts (e.g., due to synthetic corruptions, dataset
reproduction, or novel subpopulations), and datasets (Wilds, ImageNet, Breeds,
CIFAR, and MNIST). In our experiments, ATC estimates target performance
$2$-$4\times$ more accurately than prior methods. We also explore the
theoretical foundations of the problem, proving that, in general, identifying
the accuracy is just as hard as identifying the optimal predictor and thus, the
efficacy of any method rests upon (perhaps unstated) assumptions on the nature
of the shift. Finally, analyzing our method on some toy distributions, we
provide insights concerning when it works.
Related papers
- Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Unsupervised Accuracy Estimation of Deep Visual Models using
Domain-Adaptive Adversarial Perturbation without Source Samples [1.1852406625172216]
We propose a new framework to estimate model accuracy on unlabeled target data without access to source data.
Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function.
Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training.
arXiv Detail & Related papers (2023-07-19T15:33:11Z) - Predicting Out-of-Distribution Error with Confidence Optimal Transport [17.564313038169434]
We present a simple yet effective method to predict a model's performance on an unknown distribution without any addition annotation.
We show that our method, Confidence Optimal Transport (COT), provides robust estimates of a model's performance on a target domain.
Despite its simplicity, our method achieves state-of-the-art results on three benchmark datasets and outperforms existing methods by a large margin.
arXiv Detail & Related papers (2023-02-10T02:27:13Z) - Simultaneous Improvement of ML Model Fairness and Performance by
Identifying Bias in Data [1.76179873429447]
We propose a data preprocessing technique that can detect instances ascribing a specific kind of bias that should be removed from the dataset before training.
In particular, we claim that in the problem settings where instances exist with similar feature but different labels caused by variation in protected attributes, an inherent bias gets induced in the dataset.
arXiv Detail & Related papers (2022-10-24T13:04:07Z) - Uncertainty-guided Source-free Domain Adaptation [77.3844160723014]
Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model.
We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation.
arXiv Detail & Related papers (2022-08-16T08:03:30Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.