Estimating Model Performance under Domain Shifts with Class-Specific
Confidence Scores
- URL: http://arxiv.org/abs/2207.09957v1
- Date: Wed, 20 Jul 2022 15:04:32 GMT
- Title: Estimating Model Performance under Domain Shifts with Class-Specific
Confidence Scores
- Authors: Zeju Li and Konstantinos Kamnitsas and Mobarakol Islam and Chen Chen
and Ben Glocker
- Abstract summary: We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets.
We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
- Score: 25.162667593654206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models are typically deployed in a test setting that differs
from the training setting, potentially leading to decreased model performance
because of domain shift. If we could estimate the performance that a
pre-trained model would achieve on data from a specific deployment setting, for
example a certain clinic, we could judge whether the model could safely be
deployed or if its performance degrades unacceptably on the specific data.
Existing approaches estimate this based on the confidence of predictions made
on unlabeled test data from the deployment's domain. We find existing methods
struggle with data that present class imbalance, because the methods used to
calibrate confidence do not account for bias induced by class imbalance,
consequently failing to estimate class-wise accuracy. Here, we introduce
class-wise calibration within the framework of performance estimation for
imbalanced datasets. Specifically, we derive class-specific modifications of
state-of-the-art confidence-based model evaluation methods including
temperature scaling (TS), difference of confidences (DoC), and average
thresholded confidence (ATC). We also extend the methods to estimate Dice
similarity coefficient (DSC) in image segmentation. We conduct experiments on
four tasks and find the proposed modifications consistently improve the
estimation accuracy for imbalanced datasets. Our methods improve accuracy
estimation by 18\% in classification under natural domain shifts, and double
the estimation accuracy on segmentation tasks, when compared with prior
methods.
Related papers
- Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Domain-adaptive and Subgroup-specific Cascaded Temperature Regression
for Out-of-distribution Calibration [16.930766717110053]
We propose a novel meta-set-based cascaded temperature regression method for post-hoc calibration.
We partition each meta-set into subgroups based on predicted category and confidence level, capturing diverse uncertainties.
A regression network is then trained to derive category-specific and confidence-level-specific scaling, achieving calibration across meta-sets.
arXiv Detail & Related papers (2024-02-14T14:35:57Z) - On the Calibration of Uncertainty Estimation in LiDAR-based Semantic
Segmentation [7.100396757261104]
We propose a metric to measure the confidence calibration quality of a semantic segmentation model with respect to individual classes.
We additionally suggest a double use for the method to automatically find label problems to improve the quality of hand- or auto-annotated datasets.
arXiv Detail & Related papers (2023-08-04T10:59:24Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Closer Look at the Uncertainty Estimation in Semantic Segmentation under
Distributional Shift [2.05617385614792]
Uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift.
It was shown that simple color transformations already provide a strong baseline.
ensemble of models was utilized in the self-training setting to improve the pseudo-labels generation.
arXiv Detail & Related papers (2021-05-31T19:50:43Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.