Multiclass Local Calibration With the Jensen-Shannon Distance
- URL: http://arxiv.org/abs/2510.26566v1
- Date: Thu, 30 Oct 2025 14:56:07 GMT
- Title: Multiclass Local Calibration With the Jensen-Shannon Distance
- Authors: Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana,
- Abstract summary: Current approaches to multiclass calibration lack a notion of distance among inputs.<n>This is especially relevant in high-stakes settings, such as healthcare, where sparse instances are exactly those most at risk of biased treatment.<n>We propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies.
- Score: 16.08047787133007
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Developing trustworthy Machine Learning (ML) models requires their predicted probabilities to be well-calibrated, meaning they should reflect true-class frequencies. Among calibration notions in multiclass classification, strong calibration is the most stringent, as it requires all predicted probabilities to be simultaneously calibrated across all classes. However, existing approaches to multiclass calibration lack a notion of distance among inputs, which makes them vulnerable to proximity bias: predictions in sparse regions of the feature space are systematically miscalibrated. This is especially relevant in high-stakes settings, such as healthcare, where the sparse instances are exactly those most at risk of biased treatment. In this work, we address this main shortcoming by introducing a local perspective on multiclass calibration. First, we formally define multiclass local calibration and establish its relationship with strong calibration. Second, we theoretically analyze the pitfalls of existing evaluation metrics when applied to multiclass local calibration. Third, we propose a practical method for enhancing local calibration in Neural Networks, which enforces alignment between predicted probabilities and local estimates of class frequencies using the Jensen-Shannon distance. Finally, we empirically validate our approach against existing multiclass calibration techniques.
Related papers
- Scalable Utility-Aware Multiclass Calibration [53.28176049547449]
Utility calibration is a general framework that measures the calibration error relative to a specific utility function.<n>We demonstrate how this framework can unify and re-interpret several existing calibration metrics.
arXiv Detail & Related papers (2025-10-29T12:32:14Z) - Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
arXiv Detail & Related papers (2025-01-31T15:03:54Z) - Confidence Calibration of Classifiers with Many Classes [5.018156030818883]
For classification models based on neural networks, the maximum predicted class probability is often used as a confidence score.
This score rarely predicts well the probability of making a correct prediction and requires a post-processing calibration step.
arXiv Detail & Related papers (2024-11-05T10:51:01Z) - Proximity-Informed Calibration for Deep Neural Networks [49.330703634912915]
ProCal is a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity.
We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings.
arXiv Detail & Related papers (2023-06-07T16:40:51Z) - Hidden Heterogeneity: When to Choose Similarity-Based Calibration [12.788224825185633]
Black-box calibration methods are unable to detect subpopulations where calibration could improve prediction accuracy.
The paper proposes a quantitative measure for hidden heterogeneity (HH)
Experiments show that the improvements in calibration achieved by similarity-based calibration methods correlate with the amount of HH present and, given sufficient calibration data, generally exceed calibration achieved by global methods.
arXiv Detail & Related papers (2022-02-03T20:43:25Z) - Meta-Cal: Well-controlled Post-hoc Calibration by Ranking [23.253020991581963]
Post-hoc calibration is a technique to recalibrate a model, and its goal is to learn a calibration map.
Existing approaches mostly focus on constructing calibration maps with low calibration errors.
We study post-hoc calibration for multi-class classification under constraints, as a calibrator with a low calibration error does not necessarily mean it is useful in practice.
arXiv Detail & Related papers (2021-05-10T12:00:54Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it.
We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z) - Multi-Class Uncertainty Calibration via Mutual Information
Maximization-based Binning [8.780958735684958]
Post-hoc multi-class calibration is a common approach for providing confidence estimates of deep neural network predictions.
Recent work has shown that widely used scaling methods underestimate their calibration error.
We propose a shared class-wise (sCW) calibration strategy, sharing one calibrator among similar classes.
arXiv Detail & Related papers (2020-06-23T15:31:59Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.