Related papers: The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration

The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration

URL: http://arxiv.org/abs/2111.15430v4
Date: Wed, 5 Jul 2023 07:10:35 GMT
Title: The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration
Authors: Bingyuan Liu, Ismail Ben Ayed, Adrian Galdran, Jose Dolz
Abstract summary: In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
Score: 21.63888208442176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated, resulting in over-confident predictions. Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments. This yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations. Recent evidence from the literature suggests that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of image classification, semantic segmentation and NLP benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, without affecting the discriminative performance. The code is available at https://github.com/by-liu/MbLS .

Related papers

Instance-Wise Monotonic Calibration by Constrained Transformation [7.331937231993605]
A common approach for calibration is fitting a post-hoc calibration map on unseen validation data.<n>Most existing post-hoc calibration methods do not guarantee monotonicity.<n>We propose a family of novel monotonic post-hoc calibration methods.
arXiv Detail & Related papers (2025-07-09T03:32:49Z)
Sample Margin-Aware Recalibration of Temperature Scaling [20.87493013833571]
Recent advances in deep learning have significantly improved predictive accuracy.<n>Modern neural networks remain systematically overconfident, posing risks for deployment in safety-critical scenarios.<n>We propose a lightweight, data-efficient recalibration method that precisely scales logits based on the margin between the top two logits.
arXiv Detail & Related papers (2025-06-30T03:35:05Z)
Practical estimation of the optimal classification error with soft labels and calibration [52.1410307583181]
We extend a previous work that utilizes soft labels for estimating the Bayes error, the optimal error rate.<n>We tackle a more challenging problem setting: estimation with corrupted soft labels.<n>Our method is instance-free, i.e., we do not assume access to any input instances.
arXiv Detail & Related papers (2025-05-27T06:04:57Z)
Unconstrained Monotonic Calibration of Predictions in Deep Ranking Systems [29.90543561470141]
A ranking model's absolute values are essential for certain downstream tasks. Existing calibration approaches typically employ predefined transformation functions with order-preserving properties to adjust the original predictions. We propose implementing a calibrator using an Unconstrained Monotonic Neural Network (UMNN), which can learn arbitrary monotonic functions. This approach significantly relaxes the constraints on the calibrator, improving its flexibility and expressiveness while avoiding excessively distorting the original predictions.
arXiv Detail & Related papers (2025-04-19T09:35:11Z)
Calibrating Deep Neural Network using Euclidean Distance [5.675312975435121]
In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability. This research introduces a novel loss function called Focal Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples.
arXiv Detail & Related papers (2024-10-23T23:06:50Z)
Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z)
Dirichlet-Based Prediction Calibration for Learning with Noisy Labels [40.78497779769083]
Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs) Existing approaches address this issue through loss correction or example selection methods. We propose the textitDirichlet-based Prediction (DPC) method as a solution.
arXiv Detail & Related papers (2024-01-13T12:33:04Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control [14.279848166377668]
We analyze how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller. We show that when an infinite horizon is desired, a finite prediction horizon that is larger than the controllability index can be sufficient for achieving a near-optimal performance.
arXiv Detail & Related papers (2023-01-19T04:33:19Z)
Calibrating Segmentation Networks with Margin-based Label Smoothing [19.669173092632]
We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. These losses could be viewed as approximations of a linear penalty imposing equality constraints on logit distances. We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
arXiv Detail & Related papers (2022-09-09T20:21:03Z)
Improving Generalization via Uncertainty Driven Perturbations [107.45752065285821]
We consider uncertainty-driven perturbations of the training data points. Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary. We show that UDP is guaranteed to achieve the robustness margin decision on linear models.
arXiv Detail & Related papers (2022-02-11T16:22:08Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Dissecting Supervised Constrastive Learning [24.984074794337157]
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. We show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective.
arXiv Detail & Related papers (2021-02-17T15:22:38Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.