Related papers: Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought? [79.86483056611105]
Reasoning LLMs generate step-by-step chains of thought before giving an answer.<n>How robust are these reasoning traces to disruptions that occur within them?<n>We introduce a controlled evaluation framework that perturbs a model's own CoT at fixed timesteps.
arXiv Detail & Related papers (2026-02-07T10:02:58Z)
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models [59.6715047267181]
Small reasoning models (SRMs) are prone to hallucinations, especially in intermediate reasoning steps.<n>Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained chain-of-thought evaluation.<n>We propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model.
arXiv Detail & Related papers (2026-02-05T17:15:12Z)
EDIS: Diagnosing LLM Reasoning via Entropy Dynamics [3.858418431840288]
We show that the emphtemporal evolution of confidence during generation carries richer information than aggregate statistics alone.<n>We introduce the Entropy Dynamics Instability Score (textbfEDIS), a trajectory-level metric quantifying instability in entropy evolution.
arXiv Detail & Related papers (2026-02-01T15:43:50Z)
Agentic Uncertainty Quantification [76.94013626702183]
We propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals.<n>Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary.
arXiv Detail & Related papers (2026-01-22T07:16:26Z)
Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning [0.058633603884542605]
We propose to characterize the stepwise confidence signal using Signal Temporal Logic (STL)<n>Using a discriminative STL mining procedure, we discover temporal formulas that distinguish confidence signals of correct and incorrect responses.<n>We develop a confidence estimation approach that informs STL blocks with parameter hypernetworks.
arXiv Detail & Related papers (2026-01-19T20:48:06Z)
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency [78.91846841708586]
We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference.<n>We propose Neighbor-Consistency Belief (NCB), a structural measure of belief that evaluates response coherence across a conceptual neighborhood.<n>We also present Structure-Aware Training (SAT), which optimize context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%.
arXiv Detail & Related papers (2026-01-09T16:23:21Z)
Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking [11.763473690046721]
Reasoning-augmented vision language models generate explicit chains of thought that promise greater capability and transparency.<n>Models may reach correct answers via visually unfaithful intermediate steps, or reason faithfully yet fail on the final prediction.<n>We introduce the visual faithfulness of reasoning chains as a distinct evaluation dimension, focusing on whether the perception steps of a reasoning chain are grounded in the image.
arXiv Detail & Related papers (2025-12-13T07:04:42Z)
Trace Length is a Simple Uncertainty Signal in Reasoning Models [18.432200654999082]
We show that reasoning trace length is a useful confidence estimator in large reasoning models.<n>Our work reveals that reasoning post-training fundamentally alters the relationship between trace length and accuracy.<n>We identify high-entropy or "forking" tokens as playing a key role in the mechanism.
arXiv Detail & Related papers (2025-10-12T02:04:06Z)
Improving Metacognition and Uncertainty Communication in Language Models [13.389881635116472]
Large language models (LLMs) are increasingly used in decision-making contexts.<n>LLMs' confidence is often miscalibrated and poorly discriminates between correct and incorrect answers.<n>We investigate whether supervised fine-tuning can improve models' ability to communicate uncertainty.
arXiv Detail & Related papers (2025-09-30T19:50:02Z)
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty [59.97939500426759]
This paper describes RLCR, an approach to training reasoning models that jointly improves accuracy and confidence estimation.<n>We show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy.<n>We also demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration.
arXiv Detail & Related papers (2025-07-22T17:56:01Z)
How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models [28.62988505317048]
Large language models (LLMs) exhibit strikingly conflicting behaviors.<n>LLMs can appear steadfastly overconfident in their initial answers whilst being prone to excessive doubt when challenged.<n>We show that LLMs exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer.
arXiv Detail & Related papers (2025-07-03T18:57:43Z)
Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision [12.287123198288079]
Uncertainty calibration is essential for the safe deployment of large language models (LLMs)<n>We find that supervised fine-tuning with scalar confidence labels alone suffices to elicit self-verification behavior of language models.<n>We propose a simple rethinking method that boosts performance via test-time scaling based on calibrated uncertainty.
arXiv Detail & Related papers (2025-06-04T08:56:24Z)
Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models [83.24079543652253]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z)
ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z)
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models [34.59785123314865]
A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers.<n>We propose a novel Reinforcement Learning approach that allows to directly fine-tune LLMs to express calibrated confidence estimates alongside their answers to factual questions.
arXiv Detail & Related papers (2025-03-04T13:48:50Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.