The Devil is in the Margin: Margin-based Label Smoothing for Network
Calibration
- URL: http://arxiv.org/abs/2111.15430v4
- Date: Wed, 5 Jul 2023 07:10:35 GMT
- Title: The Devil is in the Margin: Margin-based Label Smoothing for Network
Calibration
- Authors: Bingyuan Liu, Ismail Ben Ayed, Adrian Galdran, Jose Dolz
- Abstract summary: In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated.
We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses.
We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
- Score: 21.63888208442176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In spite of the dominant performances of deep neural networks, recent works
have shown that they are poorly calibrated, resulting in over-confident
predictions. Miscalibration can be exacerbated by overfitting due to the
minimization of the cross-entropy during training, as it promotes the predicted
softmax probabilities to match the one-hot label assignments. This yields a
pre-softmax activation of the correct class that is significantly larger than
the remaining activations. Recent evidence from the literature suggests that
loss functions that embed implicit or explicit maximization of the entropy of
predictions yield state-of-the-art calibration performances. We provide a
unifying constrained-optimization perspective of current state-of-the-art
calibration losses. Specifically, these losses could be viewed as
approximations of a linear penalty (or a Lagrangian) imposing equality
constraints on logit distances. This points to an important limitation of such
underlying equality constraints, whose ensuing gradients constantly push
towards a non-informative solution, which might prevent from reaching the best
compromise between the discriminative performance and calibration of the model
during gradient-based optimization. Following our observations, we propose a
simple and flexible generalization based on inequality constraints, which
imposes a controllable margin on logit distances. Comprehensive experiments on
a variety of image classification, semantic segmentation and NLP benchmarks
demonstrate that our method sets novel state-of-the-art results on these tasks
in terms of network calibration, without affecting the discriminative
performance. The code is available at https://github.com/by-liu/MbLS .
Related papers
- Calibrating Deep Neural Network using Euclidean Distance [5.675312975435121]
In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples.
High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability.
This research introduces a novel loss function called Focal Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples.
arXiv Detail & Related papers (2024-10-23T23:06:50Z) - Dirichlet-Based Prediction Calibration for Learning with Noisy Labels [40.78497779769083]
Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs)
Existing approaches address this issue through loss correction or example selection methods.
We propose the textitDirichlet-based Prediction (DPC) method as a solution.
arXiv Detail & Related papers (2024-01-13T12:33:04Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control [14.279848166377668]
We analyze how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller.
We show that when an infinite horizon is desired, a finite prediction horizon that is larger than the controllability index can be sufficient for achieving a near-optimal performance.
arXiv Detail & Related papers (2023-01-19T04:33:19Z) - Calibrating Segmentation Networks with Margin-based Label Smoothing [19.669173092632]
We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses.
These losses could be viewed as approximations of a linear penalty imposing equality constraints on logit distances.
We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
arXiv Detail & Related papers (2022-09-09T20:21:03Z) - Improving Generalization via Uncertainty Driven Perturbations [107.45752065285821]
We consider uncertainty-driven perturbations of the training data points.
Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary.
We show that UDP is guaranteed to achieve the robustness margin decision on linear models.
arXiv Detail & Related papers (2022-02-11T16:22:08Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Dissecting Supervised Constrastive Learning [24.984074794337157]
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks.
We show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective.
arXiv Detail & Related papers (2021-02-17T15:22:38Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.