Related papers: Does Alignment Tuning Really Break LLMs' Internal Confidence?

Does Alignment Tuning Really Break LLMs' Internal Confidence?

URL: http://arxiv.org/abs/2409.00352v1
Date: Sat, 31 Aug 2024 05:12:36 GMT
Title: Does Alignment Tuning Really Break LLMs' Internal Confidence?
Authors: Hongseok Oh, Wonseok Hwang,
Abstract summary: Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods.
Score: 5.893124686141782
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have shown remarkable progress, but their real-world application necessitates reliable calibration. This study conducts a comprehensive analysis of calibration degradation of LLMs across four dimensions: models, calibration metrics, tasks, and confidence extraction methods. Initial analysis showed that the relationship between alignment and calibration is not always a trade-off, but under stricter analysis conditions, we found the alignment process consistently harms calibration. This highlights the need for (1) a careful approach when measuring model confidences and calibration errors and (2) future research into algorithms that can help LLMs to achieve both instruction-following and calibration without sacrificing either.

Related papers

On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z)
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration? [19.38577744626441]
Large Language Models (LLMs) often demonstrate poor calibration with confidence scores misaligned with actual performance.<n>Our research reveals that LLMs' prior knowledge causes potential poor calibration due to the ubiquitous presence of known data in real-world fine-tuning.<n>We propose CogCalib, a cognition-aware framework that applies targeted learning strategies according to the model's prior knowledge.
arXiv Detail & Related papers (2025-05-27T08:51:31Z)
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach [29.069314998955676]
preference alignment is a key technology for the success of Large Language Models (LLMs)<n>In this paper, we investigate why preference alignment affects calibration and how to address this issue.
arXiv Detail & Related papers (2025-05-04T05:42:51Z)
Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles [4.477423478591491]
Calib-n is a novel framework that trains an auxiliary model for confidence estimation. We find that few-shot prompts are the most effective for auxiliary model-based methods.
arXiv Detail & Related papers (2025-01-07T18:48:42Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Calibrating Long-form Generations from Large Language Models [34.72041258464477]
Large Language Models' (LLMs) confidence scores should align with the actual likelihood of its responses being correct. Current confidence elicitation methods and calibration metrics rely on a binary true/false assessment of response correctness. We introduce a unified calibration framework, in which both the correctness of the LLMs' responses and their associated confidence levels are treated as distributions across a range of scores.
arXiv Detail & Related papers (2024-02-09T17:00:32Z)
On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models. We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z)
Bag of Tricks for In-Distribution Calibration of Pretrained Transformers [8.876196316390493]
We present an empirical study on confidence calibration for pre-trained language models (PLMs) We find that the ensemble model overfitted to the training set shows sub-par calibration performance. We propose the Calibrated PLM (CALL), a combination of calibration techniques.
arXiv Detail & Related papers (2023-02-13T21:11:52Z)
A Close Look into the Calibration of Pre-trained Language Models [56.998539510508515]
Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty. We study the dynamic change in PLMs' calibration performance in training. We extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations.
arXiv Detail & Related papers (2022-10-31T21:31:07Z)
Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration. We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
Uncertainty Quantification and Deep Ensembles [79.4957965474334]
We show that deep-ensembles do not necessarily lead to improved calibration properties. We show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. This text examines the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce.
arXiv Detail & Related papers (2020-07-17T07:32:24Z)
Unsupervised Calibration under Covariate Shift [92.02278658443166]
We introduce the problem of calibration under domain shift and propose an importance sampling based approach to address it. We evaluate and discuss the efficacy of our method on both real-world datasets and synthetic datasets.
arXiv Detail & Related papers (2020-06-29T21:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.