Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
- URL: http://arxiv.org/abs/2602.18573v1
- Date: Fri, 20 Feb 2026 19:29:07 GMT
- Title: Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
- Authors: Amy Vennos, Xin Xing, Christopher T. Franck,
- Abstract summary: We propose Multicategory Linear Log Odds (MCLLO) recalibration.<n>MCLLO includes a likelihood ratio hypothesis test to assess calibration.<n>We demonstrate the effectiveness of the MCLLO method through simulations and three real-world case studies.
- Score: 1.1636660052962862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing methods are limited to (i) comparing calibration between two or more models rather than directly assessing the calibration of a single model, (ii) requiring under-the-hood model access, e.g., accessing logit-scale predictions within the layers of a neural network, and (iii) providing output which is difficult for human analysts to understand. To overcome (i)-(iii), we propose Multicategory Linear Log Odds (MCLLO) recalibration, which (i) includes a likelihood ratio hypothesis test to assess calibration, (ii) does not require under-the-hood access to models and is thus applicable on a wide range of classification problems, and (iii) can be easily interpreted. We demonstrate the effectiveness of the MCLLO method through simulations and three real-world case studies involving image classification via convolutional neural network, obesity analysis via random forest, and ecology via regression modeling. We compare MCLLO to four comparator recalibration techniques utilizing both our hypothesis test and the existing calibration metric Expected Calibration Error to show that our method works well alone and in concert with other methods.
Related papers
- Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques [3.2514496966247535]
We propose novel isotonic normalization-aware techniques for multiclass calibration.<n>Unlike prior approaches, our methods inherently account for probability normalization.<n>Our approach consistently improves negative log-likelihood (NLL) and expected calibration error (ECE) metrics.
arXiv Detail & Related papers (2025-12-09T19:15:19Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Calibration tests beyond classification [30.616624345970973]
Most supervised machine learning tasks are subject to irreducible prediction errors.
Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets.
Calibrated models guarantee that the predictions are neither over- nor under-confident.
arXiv Detail & Related papers (2022-10-21T09:49:57Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - On the Calibration of Pre-trained Language Models using Mixup Guided by
Area Under the Margin and Saliency [47.90235939359225]
We propose a novel mixup strategy for pre-trained language models that improves model calibration further.
Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples.
arXiv Detail & Related papers (2022-03-14T23:45:08Z) - T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem.
detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions.
We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Quantile Regularization: Towards Implicit Calibration of Regression
Models [30.872605139672086]
We present a method for calibrating regression models based on a novel quantile regularizer defined as the cumulative KL divergence between two CDFs.
We show that the proposed quantile regularizer significantly improves calibration for regression models trained using approaches, such as Dropout VI and Deep Ensembles.
arXiv Detail & Related papers (2020-02-28T16:53:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.