Related papers: Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models

Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models

URL: http://arxiv.org/abs/2512.11998v1
Date: Fri, 12 Dec 2025 19:29:05 GMT
Title: Direct Confidence Alignment: Aligning Verbalized Confidence with Internal Confidence In Large Language Models
Authors: Glenn Zhang, Treasure Mayowa, Jason Fan, Yicheng Fu, Aaron Sandoval, Sean O'Brien, Kevin Zhu,
Abstract summary: Internal confidence of a model, derived from token probabilities, is not well aligned with its verbalized confidence.<n>We propose Direct Confidence Alignment (DCA) to align an LLM's verbalized confidence with its internal confidence.
Score: 6.918665116014629
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Producing trustworthy and reliable Large Language Models (LLMs) has become increasingly important as their usage becomes more widespread. Calibration seeks to achieve this by improving the alignment between the model's confidence and the actual likelihood of its responses being correct or desirable. However, it has been observed that the internal confidence of a model, derived from token probabilities, is not well aligned with its verbalized confidence, leading to misleading results with different calibration methods. In this paper, we propose Direct Confidence Alignment (DCA), a method using Direct Preference Optimization to align an LLM's verbalized confidence with its internal confidence rather than ground-truth accuracy, enhancing model transparency and reliability by ensuring closer alignment between the two confidence measures. We evaluate DCA across multiple open-weight LLMs on a wide range of datasets. To further assess this alignment, we also introduce three new calibration error-based metrics. Our results show that DCA improves alignment metrics on certain model architectures, reducing inconsistencies in a model's confidence expression. However, we also show that it can be ineffective on others, highlighting the need for more model-aware approaches in the pursuit of more interpretable and trustworthy LLMs.

Related papers

On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z)
Calibrating Verbalized Confidence with Self-Generated Distractors [24.56911906044891]
We introduce Distractor-Normalized Coherence (DINCO)<n>DINCO estimates and accounts for an LLM's suggestibility bias by having the model its confidence independently across several self-generated distractors.<n>We frame the popular approach of self-consistency as leveraging coherence across sampled generations, and normalized verbalized confidence as leveraging coherence across validations on incompatible claims.
arXiv Detail & Related papers (2025-09-29T21:41:22Z)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally [58.63318088243125]
Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare.<n>LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence"
arXiv Detail & Related papers (2025-08-26T09:25:32Z)
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation [63.49409574310576]
Large language models (LLMs) exhibit overconfidence, assigning high confidence scores to incorrect predictions.<n>We introduce FineCE, a novel confidence estimation method that delivers accurate, fine-grained confidence scores during text generation.<n>Our code and all baselines used in the paper are available on GitHub.
arXiv Detail & Related papers (2025-08-16T13:29:35Z)
Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution [20.607071807794195]
Large Language Models (LLMs) are widely used as automated judges, where practical value depends on both accuracy and trustworthy, risk-aware judgments.<n>Existing approaches predominantly focus on accuracy, overlooking the necessity of well-calibrated confidence.<n>We advocate a shift from accuracy-centric evaluation to confidence-driven, risk-aware LLM-as-a-Judge systems.
arXiv Detail & Related papers (2025-08-08T11:11:22Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
Confidence Estimation for LLM-Based Dialogue State Tracking [9.305763502526833]
Estimation of a model's confidence on its outputs is critical for Conversational AI systems based on large language models (LLMs) We provide an exhaustive exploration of methods, including approaches proposed for open- and closed-weight LLMs. Our findings suggest that fine-tuning open-weight LLMs can result in enhanced AUC performance, indicating better confidence score calibration.
arXiv Detail & Related papers (2024-09-15T06:44:26Z)
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models [14.5291643644017]
We introduce the concept of Confidence-Probability Alignment. We probe the alignment between models' internal and expressed confidence. Among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment.
arXiv Detail & Related papers (2024-05-25T15:42:04Z)
Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency. Results show that consistency-based calibration methods outperform existing post-hoc approaches. We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.