Related papers: h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective

h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective

URL: http://arxiv.org/abs/2506.17968v1
Date: Sun, 22 Jun 2025 09:56:44 GMT
Title: h-calibration: Rethinking Classifier Recalibration with Probabilistic Error-Bounded Objective
Authors: Wenjian Huang, Guiping Cao, Jiahao Xia, Jingkun Chen, Hao Wang, Jianguo Zhang,
Abstract summary: Deep neural networks have demonstrated remarkable performance across numerous learning tasks but often suffer from miscalibration.<n>This has inspired many recent works on mitigating miscalibration, particularly through post-hoc recalibration methods.<n>We propose a probabilistic learning framework for calibration called h-calibration, which theoretically constructs an equivalent learning formulation for canonical calibration with boundedness.<n>Our method not only overcomes the ten identified limitations but also achieves markedly better performance than traditional methods.
Score: 12.903217487071172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have demonstrated remarkable performance across numerous learning tasks but often suffer from miscalibration, resulting in unreliable probability outputs. This has inspired many recent works on mitigating miscalibration, particularly through post-hoc recalibration methods that aim to obtain calibrated probabilities without sacrificing the classification performance of pre-trained models. In this study, we summarize and categorize previous works into three general strategies: intuitively designed methods, binning-based methods, and methods based on formulations of ideal calibration. Through theoretical and practical analysis, we highlight ten common limitations in previous approaches. To address these limitations, we propose a probabilistic learning framework for calibration called h-calibration, which theoretically constructs an equivalent learning formulation for canonical calibration with boundedness. On this basis, we design a simple yet effective post-hoc calibration algorithm. Our method not only overcomes the ten identified limitations but also achieves markedly better performance than traditional methods, as validated by extensive experiments. We further analyze, both theoretically and experimentally, the relationship and advantages of our learning objective compared to traditional proper scoring rule. In summary, our probabilistic framework derives an approximately equivalent differentiable objective for learning error-bounded calibrated probabilities, elucidating the correspondence and convergence properties of computational statistics with respect to theoretical bounds in canonical calibration. The theoretical effectiveness is verified on standard post-hoc calibration benchmarks by achieving state-of-the-art performance. This research offers valuable reference for learning reliable likelihood in related fields.

Related papers

Uniform convergence of the smooth calibration error and its relationship with functional gradient [10.906645958268939]
This work focuses on the smooth calibration error (CE) and provides a uniform convergence bound.<n>We analyze three representative algorithms: gradient boosting trees, kernel boosting, and two-layer neural networks.<n>Our results offer new theoretical insights and practical guidance for designing reliable probabilistic models.
arXiv Detail & Related papers (2025-05-26T01:23:56Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models [16.308958212406583]
There is a lack of focus on evaluating the performance of deep learning pipelines. With the increased use of large datasets and complex models, the training process is run only once and the result is compared to previous benchmarks. Traditional solutions, such as running the training process multiple times, are often infeasible due to computational constraints. We introduce a novel metric framework, the Calibrated Loss Metric, designed to address this issue by reducing the variance present in its conventional counterpart.
arXiv Detail & Related papers (2024-01-30T02:38:23Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
A Large-Scale Study of Probabilistic Calibration in Neural Network Regression [3.13468877208035]
We conduct the largest empirical study to date to assess the probabilistic calibration of neural networks. We introduce novel differentiable recalibration and regularization methods, uncovering new insights into their effectiveness.
arXiv Detail & Related papers (2023-06-05T09:33:39Z)
Distribution-Free Model-Agnostic Regression Calibration via Nonparametric Methods [9.662269016653296]
We consider an individual calibration objective for characterizing the quantiles of the prediction model. Existing methods have been largely and lack of statistical guarantee in terms of individual calibration. We propose simple nonparametric calibration methods that are agnostic of the underlying prediction model.
arXiv Detail & Related papers (2023-05-20T21:31:51Z)
Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance. We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance. Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z)
Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression. This framework allows one to transform any regression model into a calibrated probabilistic model. We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
Predictive machine learning for prescriptive applications: a coupled training-validating approach [77.34726150561087]
We propose a new method for training predictive machine learning models for prescriptive applications. This approach is based on tweaking the validation step in the standard training-validating-testing scheme. Several experiments with synthetic data demonstrate promising results in reducing the prescription costs in both deterministic and real models.
arXiv Detail & Related papers (2021-10-22T15:03:20Z)
Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
Learning Prediction Intervals for Regression: Generalization and Calibration [12.576284277353606]
We study the generation of prediction intervals in regression for uncertainty quantification. We use a general learning theory to characterize the optimality-feasibility tradeoff that encompasses Lipschitz continuity and VC-subgraph classes. We empirically demonstrate the strengths of our interval generation and calibration algorithms in terms of testing performances compared to existing benchmarks.
arXiv Detail & Related papers (2021-02-26T17:55:30Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.