Related papers: Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach

Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach

URL: http://arxiv.org/abs/2505.01997v2
Date: Sat, 07 Jun 2025 02:46:20 GMT
Title: Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Authors: Jiancong Xiao, Bojian Hou, Zhanliang Wang, Ruochen Jin, Qi Long, Weijie J. Su, Li Shen,
Abstract summary: preference alignment is a key technology for the success of Large Language Models (LLMs)<n>In this paper, we investigate why preference alignment affects calibration and how to address this issue.
Score: 29.069314998955676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated, LLMs tend to become poorly calibrated after alignment with human preferences. In this paper, we investigate why preference alignment affects calibration and how to address this issue. For the first question, we observe that the preference collapse issue in alignment undesirably generalizes to the calibration scenario, causing LLMs to exhibit overconfidence and poor calibration. To address this, we demonstrate the importance of fine-tuning with domain-specific knowledge to alleviate the overconfidence issue. To further analyze whether this affects the model's performance, we categorize models into two regimes: calibratable and non-calibratable, defined by bounds of Expected Calibration Error (ECE). In the calibratable regime, we propose a calibration-aware fine-tuning approach to achieve proper calibration without compromising LLMs' performance. However, as models are further fine-tuned for better performance, they enter the non-calibratable regime. For this case, we develop an EM-algorithm-based ECE regularization for the fine-tuning loss to maintain low calibration error. Extensive experiments validate the effectiveness of the proposed methods.

Related papers

Unconstrained Monotonic Calibration of Predictions in Deep Ranking Systems [29.90543561470141]
A ranking model's absolute values are essential for certain downstream tasks.<n>Existing calibration approaches typically employ predefined transformation functions with order-preserving properties to adjust the original predictions.<n>We propose implementing a calibrator using an Unconstrained Monotonic Neural Network (UMNN), which can learn arbitrary monotonic functions.<n>This approach significantly relaxes the constraints on the calibrator, improving its flexibility and expressiveness while avoiding excessively distorting the original predictions.
arXiv Detail & Related papers (2025-04-19T09:35:11Z)
Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration [34.52946891778497]
Deep neural networks (DNNs) have demonstrated state-of-the-art performance across various domains.<n>They often face calibration issues, particularly in safety-critical applications such as autonomous driving and healthcare.<n>Recent research has started to improve model calibration from the view of the classifier.
arXiv Detail & Related papers (2025-04-14T09:09:01Z)
Does Alignment Tuning Really Break LLMs' Internal Confidence? [5.893124686141782]
Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration.<n>This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods.
arXiv Detail & Related papers (2024-08-31T05:12:36Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Adaptive Calibrator Ensemble for Model Calibration under Distribution Shift [23.794897699193875]
adaptive calibrator ensemble (ACE) calibrates OOD datasets whose difficulty is usually higher than the calibration set. ACE generally improves the performance of a few state-of-the-art calibration schemes on a series of OOD benchmarks.
arXiv Detail & Related papers (2023-03-09T15:22:02Z)
On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration. Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z)
Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression. This framework allows one to transform any regression model into a calibrated probabilistic model. We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error [46.12703434199988]
We introduce a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised. We also propose a meta-learning framework that uses DECE to optimise for validation set calibration.
arXiv Detail & Related papers (2021-06-17T15:47:50Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.