Related papers: Beyond calibration: estimating the grouping loss of modern neural networks

Beyond calibration: estimating the grouping loss of modern neural networks

URL: http://arxiv.org/abs/2210.16315v3
Date: Thu, 27 Apr 2023 12:00:35 GMT
Title: Beyond calibration: estimating the grouping loss of modern neural networks
Authors: Alexandre Perez-Lebel (SODA), Marine Le Morvan (SODA), Ga\"el Varoquaux (SODA)
Abstract summary: Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss. We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
Score: 68.8204255655161
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to ensure that a classifier gives reliable confidence scores is essential to ensure informed decision-making. To this end, recent work has focused on miscalibration, i.e., the over or under confidence of model scores. Yet calibration is not enough: even a perfectly calibrated classifier with the best possible accuracy can have confidence scores that are far from the true posterior probabilities. This is due to the grouping loss, created by samples with the same confidence scores but different true posterior probabilities. Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss. While there are many estimators of the calibration loss, none exists for the grouping loss in standard settings. Here, we propose an estimator to approximate the grouping loss. We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings, which highlights the importance of pre-production validation.

Related papers

Enhance GNNs with Reliable Confidence Estimation via Adversarial Calibration Learning [30.450482094196243]
Despite their impressive predictive performance, GNNs often exhibit poor confidence calibration. This issue raises concerns about their reliability in high-stakes domains such as fraud detection, and risk assessment. We propose a novel AdvCali framework that adaptively enhances calibration across different node groups.
arXiv Detail & Related papers (2025-03-23T23:04:41Z)
Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Calibrating Deep Neural Network using Euclidean Distance [5.675312975435121]
In machine learning, Focal Loss is commonly used to reduce misclassification rates by emphasizing hard-to-classify samples. High calibration error indicates a misalignment between predicted probabilities and actual outcomes, affecting model reliability. This research introduces a novel loss function called Focal Loss (FCL), designed to improve probability calibration while retaining the advantages of Focal Loss in handling difficult samples.
arXiv Detail & Related papers (2024-10-23T23:06:50Z)
Optimizing Calibration by Gaining Aware of Prediction Correctness [30.619608580138802]
Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. We propose a new post-hoc calibration objective derived from the aim of calibration.
arXiv Detail & Related papers (2024-04-19T17:25:43Z)
Few-Shot Recalibration of Language Models [23.829795148520834]
We train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice. Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods.
arXiv Detail & Related papers (2024-03-27T06:25:40Z)
Reconfidencing LLMs from the Grouping Loss Perspective [56.801251926946485]
Large Language Models (LLMs) are susceptible to generating hallucinated answers in a confident tone. Recent findings show that controlling uncertainty must go beyond calibration. We construct a new evaluation dataset derived from a knowledge base to assess confidence scores given to answers of Mistral and LLaMA.
arXiv Detail & Related papers (2024-02-07T15:40:22Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Bayesian Confidence Calibration for Epistemic Uncertainty Modelling [4.358626952482686]
We introduce a framework to obtain confidence estimates in conjunction with an uncertainty of the calibration method. We achieve state-of-the-art calibration performance for object detection calibration.
arXiv Detail & Related papers (2021-09-21T10:53:16Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Distribution-free binary classification: prediction sets, confidence intervals and calibration [106.50279469344937]
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting. We derive confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration.
arXiv Detail & Related papers (2020-06-18T14:17:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.