Related papers: Beyond One-Hot Labels: Semantic Mixing for Model Calibration

Beyond One-Hot Labels: Semantic Mixing for Model Calibration

URL: http://arxiv.org/abs/2504.13548v1
Date: Fri, 18 Apr 2025 08:26:18 GMT
Title: Beyond One-Hot Labels: Semantic Mixing for Model Calibration
Authors: Haoyang Luo, Linwei Tao, Minjing Dong, Chang Xu,
Abstract summary: We introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty.<n>We propose calibrated reannotation to tackle the misalignment between the annotated confidence score and the mixing ratio.<n> Experimental results demonstrate that CSM achieves superior calibration compared to the state-of-the-art calibration approaches.
Score: 22.39558434131574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model calibration seeks to ensure that models produce confidence scores that accurately reflect the true likelihood of their predictions being correct. However, existing calibration approaches are fundamentally tied to datasets of one-hot labels implicitly assuming full certainty in all the annotations. Such datasets are effective for classification but provides insufficient knowledge of uncertainty for model calibration, necessitating the curation of datasets with numerically rich ground-truth confidence values. However, due to the scarcity of uncertain visual examples, such samples are not easily available as real datasets. In this paper, we introduce calibration-aware data augmentation to create synthetic datasets of diverse samples and their ground-truth uncertainty. Specifically, we present Calibration-aware Semantic Mixing (CSM), a novel framework that generates training samples with mixed class characteristics and annotates them with distinct confidence scores via diffusion models. Based on this framework, we propose calibrated reannotation to tackle the misalignment between the annotated confidence score and the mixing ratio during the diffusion reverse process. Besides, we explore the loss functions that better fit the new data representation paradigm. Experimental results demonstrate that CSM achieves superior calibration compared to the state-of-the-art calibration approaches. Code is available at github.com/E-Galois/CSM.

Related papers

Sample Margin-Aware Recalibration of Temperature Scaling [20.87493013833571]
Recent advances in deep learning have significantly improved predictive accuracy.<n>Modern neural networks remain systematically overconfident, posing risks for deployment in safety-critical scenarios.<n>We propose a lightweight, data-efficient recalibration method that precisely scales logits based on the margin between the top two logits.
arXiv Detail & Related papers (2025-06-30T03:35:05Z)
Combining Priors with Experience: Confidence Calibration Based on Binomial Process Modeling [3.4580564656984736]
Existing confidence calibration methods mostly use statistical techniques to estimate the calibration curve from data.<n>A new calibration metric ($TCE_bpm$), which leverages the estimated calibration curve to estimate the true calibration error (TCE), is designed.<n>The effectiveness of our calibration method and metric are verified in real-world and simulated data.
arXiv Detail & Related papers (2024-12-14T03:04:05Z)
CALICO: Confident Active Learning with Integrated Calibration [11.978551396144532]
We propose an AL framework that self-calibrates the confidence used for sample selection during the training process. We show improved classification performance compared to a softmax-based classifier with fewer labeled samples.
arXiv Detail & Related papers (2024-07-02T15:05:19Z)
Few-Shot Recalibration of Language Models [23.829795148520834]
We train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice. Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods.
arXiv Detail & Related papers (2024-03-27T06:25:40Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Proximity-Informed Calibration for Deep Neural Networks [49.330703634912915]
ProCal is a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings.
arXiv Detail & Related papers (2023-06-07T16:40:51Z)
Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks. We analyze problem statement, calibration definitions, and different approaches to evaluation. Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z)
On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration. Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z)
Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling. We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy. We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z)
Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression. This framework allows one to transform any regression model into a calibrated probabilistic model. We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss [17.26964140836123]
In some scenarios, users do not only care about the accuracy but also the confidence of model. We propose a model using the hyperspherical space and rebalanced accuracy-uncertainty loss. Our model outperforms the existing calibration methods and achieves a significant improvement on the calibration metric.
arXiv Detail & Related papers (2022-03-17T12:01:33Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
On the Role of Dataset Quality and Heterogeneity in Model Confidence [27.657631193015252]
Safety-critical applications require machine learning models that output accurate and calibrated probabilities. Uncalibrated deep networks are known to make over-confident predictions. We study the impact of dataset quality by studying the impact of dataset size and the label noise on the model confidence.
arXiv Detail & Related papers (2020-02-23T05:13:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.