A calibration test for evaluating set-based epistemic uncertainty representations
- URL: http://arxiv.org/abs/2502.16299v1
- Date: Sat, 22 Feb 2025 17:10:45 GMT
- Title: A calibration test for evaluating set-based epistemic uncertainty representations
- Authors: Mira Jürgens, Thomas Mortier, Eyke Hüllermeier, Viktor Bengs, Willem Waegeman,
- Abstract summary: We propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution.<n>In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space.
- Score: 25.768233719182742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The accurate representation of epistemic uncertainty is a challenging yet essential task in machine learning. A widely used representation corresponds to convex sets of probabilistic predictors, also known as credal sets. One popular way of constructing these credal sets is via ensembling or specialized supervised learning methods, where the epistemic uncertainty can be quantified through measures such as the set size or the disagreement among members. In principle, these sets should contain the true data-generating distribution. As a necessary condition for this validity, we adopt the strongest notion of calibration as a proxy. Concretely, we propose a novel statistical test to determine whether there is a convex combination of the set's predictions that is calibrated in distribution. In contrast to previous methods, our framework allows the convex combination to be instance dependent, recognizing that different ensemble members may be better calibrated in different regions of the input space. Moreover, we learn this combination via proper scoring rules, which inherently optimize for calibration. Building on differentiable, kernel-based estimators of calibration errors, we introduce a nonparametric testing procedure and demonstrate the benefits of capturing instance-level variability on of synthetic and real-world experiments.
Related papers
- SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)
We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.
Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z) - Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.<n>Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.<n>We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Calibration by Distribution Matching: Trainable Kernel Calibration
Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression.
These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization.
We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z) - Test-time Recalibration of Conformal Predictors Under Distribution Shift
Based on Unlabeled Examples [30.61588337557343]
Conformal predictors provide uncertainty estimates by computing a set of classes with a user-specified probability.
We propose a method that provides excellent uncertainty estimates under natural distribution shifts.
arXiv Detail & Related papers (2022-10-09T04:46:00Z) - On the Calibration of Probabilistic Classifier Sets [6.759124697337311]
We extend the notion of calibration to evaluate the validity of an aleatoric uncertainty representation.
We show that ensembles of deep neural networks are often not well calibrated.
arXiv Detail & Related papers (2022-05-20T10:57:46Z) - Should Ensemble Members Be Calibrated? [16.331175260764]
Modern deep neural networks are often observed to be poorly calibrated.
Deep learning approaches make use of large numbers of model parameters.
This paper explores the application of calibration schemes to deep ensembles.
arXiv Detail & Related papers (2021-01-13T23:59:00Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Distribution-free binary classification: prediction sets, confidence
intervals and calibration [106.50279469344937]
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting.
We derive confidence intervals for binned probabilities for both fixed-width and uniform-mass binning.
As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration.
arXiv Detail & Related papers (2020-06-18T14:17:29Z) - Calibrate and Prune: Improving Reliability of Lottery Tickets Through
Prediction Calibration [40.203492372949576]
Supervised models with uncalibrated confidences tend to be overconfident even when making wrong prediction.
We study how explicit confidence calibration in the over- parameterized network impacts the quality of the resulting lottery tickets.
Our empirical studies reveal that including calibration mechanisms consistently lead to more effective lottery tickets.
arXiv Detail & Related papers (2020-02-10T15:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.