Improving Metacognition and Uncertainty Communication in Language Models
- URL: http://arxiv.org/abs/2510.05126v2
- Date: Tue, 21 Oct 2025 21:46:32 GMT
- Title: Improving Metacognition and Uncertainty Communication in Language Models
- Authors: Mark Steyvers, Catarina Belem, Padhraic Smyth,
- Abstract summary: Large language models (LLMs) are increasingly used in decision-making contexts.<n>LLMs' confidence is often miscalibrated and poorly discriminates between correct and incorrect answers.<n>We investigate whether supervised fine-tuning can improve models' ability to communicate uncertainty.
- Score: 13.389881635116472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used in decision-making contexts, but when they present answers without signaling low confidence, users may unknowingly act on erroneous outputs. Prior work shows that LLMs maintain internal uncertainty signals, yet their expressed confidence is often miscalibrated and poorly discriminates between correct and incorrect answers. We investigate whether supervised fine-tuning can improve models' ability to communicate uncertainty and whether such improvements generalize across tasks and domains. We fine-tune LLMs on datasets spanning general knowledge, mathematics, and open-ended trivia, and evaluate two metacognitive tasks: (1) single-question confidence estimation, where the model assigns a numeric certainty to its answer, and (2) pairwise confidence comparison, where the model selects which of two answers it is more likely to answer correctly. We assess generalization to unseen domains, including medical and legal reasoning. Results show that fine-tuning improves calibration (alignment between stated confidence and accuracy) and discrimination (higher confidence for correct vs. incorrect responses) within and across domains. However, gains are task-specific: training on single-question calibration does not transfer to pairwise comparison, and vice versa. Multitask fine-tuning yields broader gains, lowering calibration error and strengthening discrimination in out-of-domain evaluations. This suggests that uncertainty communication in LLMs is trainable but requires multitask training to generalize effectively.
Related papers
- On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z) - ConfTuner: Training Large Language Models to Express Their Confidence Verbally [58.63318088243125]
Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare.<n>LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence"
arXiv Detail & Related papers (2025-08-26T09:25:32Z) - Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty [59.97939500426759]
This paper describes RLCR, an approach to training reasoning models that jointly improves accuracy and confidence estimation.<n>We show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy.<n>We also demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration.
arXiv Detail & Related papers (2025-07-22T17:56:01Z) - Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation [26.580361841501514]
Vision-language models (VLMs) excel in various multimodal tasks but frequently suffer from poor calibration.<n>This miscalibration undermines user trust, especially when models confidently provide incorrect or fabricated information.<n>We propose a novel Confidence through Semantic Perturbation (CSP) framework to improve the calibration of verbalized confidence for object-centric queries.
arXiv Detail & Related papers (2025-04-21T04:01:22Z) - Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence [16.311538811237536]
Large language models (LLMs) are increasingly used for factual question-answering.<n>For these verbalized expressions of uncertainty to be meaningful, they should reflect the error rates at the expressed level of confidence.<n>We propose a simple procedure, uncertainty distillation, to teach an LLM to calibrated semantic confidences.
arXiv Detail & Related papers (2025-03-18T21:29:29Z) - Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level.
We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models [69.68379406317682]
We introduce a listener-aware finetuning method (LACIE) to calibrate implicit and explicit confidence markers.
We show that LACIE models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener.
We find that training with LACIE results in 47% fewer incorrect answers being accepted while maintaining the same level of acceptance for correct answers.
arXiv Detail & Related papers (2024-05-31T17:16:38Z) - What Large Language Models Know and What People Think They Know [13.939511057660013]
Large language models (LLMs) are increasingly integrated into decision-making processes.<n>To earn human trust, LLMs must be well calibrated so that they can accurately assess and communicate the likelihood of their predictions being correct.<n>Here we explore the calibration gap, which refers to the difference between human confidence in LLM-generated answers and the models' actual confidence, and the discrimination gap, which reflects how well humans and models can distinguish between correct and incorrect answers.
arXiv Detail & Related papers (2024-01-24T22:21:04Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.