Related papers: A Variational Estimator for $L

A Variational Estimator for $L_p$ Calibration Errors

URL: http://arxiv.org/abs/2602.24230v1
Date: Fri, 27 Feb 2026 17:56:52 GMT
Title: A Variational Estimator for $L_p$ Calibration Errors
Authors: Eugène Berta, Sacha Braun, David Holzmüller, Francis Bach, Michael I. Jordan,
Abstract summary: We show how to extend a recent variational framework for estimating calibration errors beyond divergences induced by $_p$ divergences to cover a broad class calibration errors induced by $L_p$ divergences.
Score: 44.81527473428586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is traditionally assessed via a divergence function, using the expected divergence between predictions and empirical frequencies. Accurately estimating this quantity is challenging, especially in the multiclass setting. Here, we show how to extend a recent variational framework for estimating calibration errors beyond divergences induced induced by proper losses, to cover a broad class of calibration errors induced by $L_p$ divergences. Our method can separate over- and under-confidence and, unlike non-variational approaches, avoids overestimation. We provide extensive experiments and integrate our code in the open-source package probmetrics (https://github.com/dholzmueller/probmetrics) for evaluating calibration errors.

Related papers

Scalable Utility-Aware Multiclass Calibration [53.28176049547449]
Utility calibration is a general framework that measures the calibration error relative to a specific utility function.<n>We demonstrate how this framework can unify and re-interpret several existing calibration metrics.
arXiv Detail & Related papers (2025-10-29T12:32:14Z)
Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization [4.065502917666599]
We introduce a general regularization framework to enforce multivariate calibration during training for arbitrary pre-rank functions.<n>We show that our methods significantly improve calibration across all pre-rank functions without sacrificing predictive accuracy.
arXiv Detail & Related papers (2025-10-24T09:16:12Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.<n>We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z)
Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model.<n>Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Consistent and Asymptotically Unbiased Estimation of Proper Calibration Errors [23.819464242327257]
We propose a method that allows consistent estimation of all proper calibration errors and refinement terms. We prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks. Our experiments validate the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
arXiv Detail & Related papers (2023-12-14T01:20:08Z)
A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions. We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates. Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.