Improving Predictor Reliability with Selective Recalibration
- URL: http://arxiv.org/abs/2410.05407v1
- Date: Mon, 7 Oct 2024 18:17:31 GMT
- Title: Improving Predictor Reliability with Selective Recalibration
- Authors: Thomas P. Zollo, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel,
- Abstract summary: Recalibration is one of the most effective ways to produce reliable confidence estimates with a pre-trained model.
We propose textitselective recalibration, where a selection model learns to reject some user-chosen proportion of the data.
Our results show that selective recalibration consistently leads to significantly lower calibration error than a wide range of selection and recalibration baselines.
- Score: 15.319277333431318
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A reliable deep learning system should be able to accurately express its confidence with respect to its predictions, a quality known as calibration. One of the most effective ways to produce reliable confidence estimates with a pre-trained model is by applying a post-hoc recalibration method. Popular recalibration methods like temperature scaling are typically fit on a small amount of data and work in the model's output space, as opposed to the more expressive feature embedding space, and thus usually have only one or a handful of parameters. However, the target distribution to which they are applied is often complex and difficult to fit well with such a function. To this end we propose \textit{selective recalibration}, where a selection model learns to reject some user-chosen proportion of the data in order to allow the recalibrator to focus on regions of the input space that can be well-captured by such a model. We provide theoretical analysis to motivate our algorithm, and test our method through comprehensive experiments on difficult medical imaging and zero-shot classification tasks. Our results show that selective recalibration consistently leads to significantly lower calibration error than a wide range of selection and recalibration baselines.
Related papers
- Few-Shot Recalibration of Language Models [23.829795148520834]
We train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice.
Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice.
Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods.
arXiv Detail & Related papers (2024-03-27T06:25:40Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles.
Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches.
We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Confidence Calibration for Intent Detection via Hyperspherical Space and
Rebalanced Accuracy-Uncertainty Loss [17.26964140836123]
In some scenarios, users do not only care about the accuracy but also the confidence of model.
We propose a model using the hyperspherical space and rebalanced accuracy-uncertainty loss.
Our model outperforms the existing calibration methods and achieves a significant improvement on the calibration metric.
arXiv Detail & Related papers (2022-03-17T12:01:33Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints.
We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.