Are LLM Decisions Faithful to Verbal Confidence?
- URL: http://arxiv.org/abs/2601.07767v1
- Date: Mon, 12 Jan 2026 17:49:51 GMT
- Title: Are LLM Decisions Faithful to Verbal Confidence?
- Authors: Jiawei Wang, Yanfei Zhou, Siddartha Devic, Deqing Fu,
- Abstract summary: We introduce a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties.<n>Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain.<n>This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems.
- Score: 15.666596480779104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the reasoning, knowledge, or decision making of the model. To test this, we introduce $\textbf{RiskEval}$: a framework designed to evaluate whether models adjust their abstention policies in response to varying error penalties. Our evaluation of several frontier models reveals a critical dissociation: models are neither cost-aware when articulating their verbal confidence, nor strategically responsive when deciding whether to engage or abstain under high-penalty conditions. Even when extreme penalties render frequent abstention the mathematically optimal strategy, models almost never abstain, resulting in utility collapse. This indicates that calibrated verbal confidence scores may not be sufficient to create trustworthy and interpretable AI systems, as current models lack the strategic agency to convert uncertainty signals into optimal and risk-sensitive decisions.
Related papers
- Decision-Aware Trust Signal Alignment for SOC Alert Triage [0.0]
The present paper presents a decision-sensitive trust signal correspondence scheme of SOC alert triage.<n>The framework combines confidence that has been calibrated, lightweight uncertainty cues, and cost-sensitive decision thresholds into coherent decision-support layer.<n>We show that false negatives are greatly amplified by the presence of misaligned displays of confidence, whereas cost weighted loss decreases by orders of magnitude between models with decision aligned trust signals.
arXiv Detail & Related papers (2026-01-08T01:41:54Z) - Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models [58.198220611190884]
We investigate the impact of decoding strategies on uncertainty estimation in Large Language Models (LLMs)<n>Our experiments show that Contrastive Search, which mitigates repetition, yields better uncertainty estimates on average across a range of preference-aligned LLMs.
arXiv Detail & Related papers (2025-09-20T13:48:13Z) - Can LLMs Detect Their Confabulations? Estimating Reliability in Uncertainty-Aware Language Models [24.72990207218907]
Large Language Models (LLMs) are prone to generating fluent but incorrect content, known as confabulation.<n>We investigate how in-context information influences model behavior and whether LLMs can identify their unreliable responses.
arXiv Detail & Related papers (2025-08-11T16:12:36Z) - Query-Level Uncertainty in Large Language Models [39.59641844929696]
We propose a method to detect knowledge boundaries via Query-Level Uncertainty.<n>This method estimates if a model is capable of answering a given query before generating any tokens, thus avoiding the generation cost.<n>We demonstrate its benefits in adaptive inference settings, showing that for RAG and model cascading it reduces inference costs while preserving overall performance.
arXiv Detail & Related papers (2025-06-11T12:39:48Z) - Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence [16.311538811237536]
Large language models (LLMs) are increasingly used for factual question-answering.<n>For these verbalized expressions of uncertainty to be meaningful, they should reflect the error rates at the expressed level of confidence.<n>We propose a simple procedure, uncertainty distillation, to teach an LLM to calibrated semantic confidences.
arXiv Detail & Related papers (2025-03-18T21:29:29Z) - Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models [63.559461750135334]
Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals.<n>We study this "answer-or-defer" problem with an evaluation framework that systematically varies human-specified risk structures.<n>We find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs' decision policies.
arXiv Detail & Related papers (2025-03-03T09:16:26Z) - On Evaluating the Durability of Safeguards for Open-Weight LLMs [80.36750298080275]
We discuss whether technical safeguards can impede the misuse of large language models (LLMs)<n>We show that even evaluating these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are.<n>We suggest future research carefully cabin claims to more constrained, well-defined, and rigorously examined threat models.
arXiv Detail & Related papers (2024-12-10T01:30:32Z) - Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning [10.457661605916435]
Large language models (LLMs) have revolutionized the field of natural language processing with their impressive reasoning and question-answering capabilities.<n>LLMs are sometimes prone to generating credible-sounding but incorrect information, a phenomenon known as hallucinations.<n>We introduce a novel uncertainty-aware causal language modeling loss function, grounded in the principles of decision theory.
arXiv Detail & Related papers (2024-12-03T23:14:47Z) - Large Language Models Must Be Taught to Know What They Don't Know [97.90008709512921]
We show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead.<n>We also investigate the mechanisms that enable reliable uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators.
arXiv Detail & Related papers (2024-06-12T16:41:31Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - A Meta-heuristic Approach to Estimate and Explain Classifier Uncertainty [0.4264192013842096]
This work proposes a set of class-independent meta-heuristics that can characterize the complexity of an instance in terms of factors are mutually relevant to both human and machine learning decision-making.
The proposed measures and framework hold promise for improving model development for more complex instances, as well as providing a new means of model abstention and explanation.
arXiv Detail & Related papers (2023-04-20T13:09:28Z) - Approaching Neural Network Uncertainty Realism [53.308409014122816]
Quantifying or at least upper-bounding uncertainties is vital for safety-critical systems such as autonomous vehicles.
We evaluate uncertainty realism -- a strict quality criterion -- with a Mahalanobis distance-based statistical test.
We adopt it to the automotive domain and show that it significantly improves uncertainty realism compared to a plain encoder-decoder model.
arXiv Detail & Related papers (2021-01-08T11:56:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.