Related papers: Rethinking Early Stopping: Refine, Then Calibrate

Rethinking Early Stopping: Refine, Then Calibrate

URL: http://arxiv.org/abs/2501.19195v2
Date: Wed, 25 Jun 2025 16:24:12 GMT
Title: Rethinking Early Stopping: Refine, Then Calibrate
Authors: Eugène Berta, David Holzmüller, Michael I. Jordan, Francis Bach,
Abstract summary: We present a novel variational formulation of the calibration-refinement decomposition.<n>We provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training.
Score: 49.966899634962374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning classifiers often produce probabilistic predictions that are critical for accurate and interpretable decision-making in various domains. The quality of these predictions is generally evaluated with proper losses, such as cross-entropy, which decompose into two components: calibration error assesses general under/overconfidence, while refinement error measures the ability to distinguish different classes. In this paper, we present a novel variational formulation of the calibration-refinement decomposition that sheds new light on post-hoc calibration, and enables rapid estimation of the different terms. Equipped with this new perspective, we provide theoretical and empirical evidence that calibration and refinement errors are not minimized simultaneously during training. Selecting the best epoch based on validation loss thus leads to a compromise point that is suboptimal for both terms. To address this, we propose minimizing refinement error only during training (Refine,...), before minimizing calibration error post hoc, using standard techniques (...then Calibrate). Our method integrates seamlessly with any classifier and consistently improves performance across diverse classification tasks.

Related papers

Reassessing How to Compare and Improve the Calibration of Machine Learning Models [7.183341902583164]
A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.<n>We show that there exist trivial recalibration approaches that can appear seemingly state-of-the-art unless calibration and prediction metrics are accompanied by additional generalization metrics.
arXiv Detail & Related papers (2024-06-06T13:33:45Z)
Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Optimizing Calibration by Gaining Aware of Prediction Correctness [30.619608580138802]
Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class.<n>We propose a new post-hoc calibration objective derived from the aim of calibration.
arXiv Detail & Related papers (2024-04-19T17:25:43Z)
Consistent and Asymptotically Unbiased Estimation of Proper Calibration Errors [23.819464242327257]
We propose a method that allows consistent estimation of all proper calibration errors and refinement terms. We prove the relation between refinement and f-divergences, which implies information monotonicity in neural networks. Our experiments validate the claimed properties of the proposed estimator and suggest that the selection of a post-hoc calibration method should be determined by the particular calibration error of interest.
arXiv Detail & Related papers (2023-12-14T01:20:08Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Scaling of Class-wise Training Losses for Post-hoc Calibration [6.0632746602205865]
We propose a new calibration method to synchronize the class-wise training losses. We design a new training loss to alleviate the variance of class-wise training losses by using multiple class-wise scaling factors. We validate the proposed framework by employing it in the various post-hoc calibration methods.
arXiv Detail & Related papers (2023-06-19T14:59:37Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks. We analyze problem statement, calibration definitions, and different approaches to evaluation. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z)
Multi-Head Multi-Loss Model Calibration [13.841172927454204]
We introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles. Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches. We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets.
arXiv Detail & Related papers (2023-03-02T09:32:32Z)
Beyond calibration: estimating the grouping loss of modern neural networks [68.8204255655161]
Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss. We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
arXiv Detail & Related papers (2022-10-28T07:04:20Z)
Class-wise and reduced calibration methods [0.0]
We show how a reduced calibration method transforms the original problem into a simpler one. Second, we propose class-wise calibration methods, based on building on a phenomenon called neural collapse. Applying the two methods together results in class-wise reduced calibration algorithms, which are powerful tools for reducing the prediction and per-class calibration errors.
arXiv Detail & Related papers (2022-10-07T17:13:17Z)
Better Uncertainty Calibration via Proper Scores for Classification and Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score. This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.