Related papers: The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination

URL: http://arxiv.org/abs/2510.22977v1
Date: Mon, 27 Oct 2025 03:58:29 GMT
Title: The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination
Authors: Chenlong Yin, Zeyang Sha, Shiwen Cui, Changhua Meng,
Abstract summary: We show that progressively enhancing reasoning through Reasoning RL increases tool hallucination proportionally with task performance gains.<n>Mechanistically, Reasoning RL disproportionately collapses tool-reliability-related representations, and hallucinations surface as amplified divergences concentrated in late-layer residual streams.
Score: 11.89501927277778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enhancing the reasoning capabilities of Large Language Models (LLMs) is a key strategy for building Agents that "think then act." However, recent observations, like OpenAI's o3, suggest a paradox: stronger reasoning often coincides with increased hallucination, yet no prior work has systematically examined whether reasoning enhancement itself causes tool hallucination. To address this gap, we pose the central question: Does strengthening reasoning increase tool hallucination? To answer this, we introduce SimpleToolHalluBench, a diagnostic benchmark measuring tool hallucination in two failure modes: (i) no tool available, and (ii) only distractor tools available. Through controlled experiments, we establish three key findings. First, we demonstrate a causal relationship: progressively enhancing reasoning through RL increases tool hallucination proportionally with task performance gains. Second, this effect transcends overfitting - training on non-tool tasks (e.g., mathematics) still amplifies subsequent tool hallucination. Third, the effect is method-agnostic, appearing when reasoning is instilled via supervised fine-tuning and when it is merely elicited at inference by switching from direct answers to step-by-step thinking. We also evaluate mitigation strategies including Prompt Engineering and Direct Preference Optimization (DPO), revealing a fundamental reliability-capability trade-off: reducing hallucination consistently degrades utility. Mechanistically, Reasoning RL disproportionately collapses tool-reliability-related representations, and hallucinations surface as amplified divergences concentrated in late-layer residual streams. These findings reveal that current reasoning enhancement methods inherently amplify tool hallucination, highlighting the need for new training objectives that jointly optimize for capability and reliability.

Related papers

Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization [78.94590726578014]
multimodal reasoning models (MLRMs) remain prone to hallucinations, and effective solutions are still underexplored.<n>We propose C3PO, a training-based mitigation framework comprising textbfCompression and textbfPreference textbfOptimization.
arXiv Detail & Related papers (2026-02-03T11:00:55Z)
Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning [26.401906729658688]
Agentic Reinforcement Learning (ARL) focuses on training large language models to interleave reasoning with external tool execution to solve complex tasks.<n>Most existing ARL methods train a single shared model parameters to support both reasoning and tool use behaviors, implicitly assuming that joint training leads to improved overall agent performance.<n>We show that these two capabilities often induce misaligned gradient directions, leading to training interference that undermines the effectiveness of joint optimization.<n>We propose Disentangled Action Reasoning Tuning(DART), a simple and efficient framework that explicitly decouples parameter updates for reasoning and tool-use via separate low-rank
arXiv Detail & Related papers (2026-02-01T03:19:22Z)
Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control [17.712786361140818]
Hallucination remains a persistent failure mode, manifesting itself as erroneous reasoning chains and misinterpretation of visual content.<n>In this study, we observe that attention heads exhibit a staged division: shallow heads predominantly serve perception, while deeper heads shift toward symbolic reasoning.<n>We propose a lightweight and interpretable two-step plugin, Functional Head Identification and Class-language Rescaling, which locates perception- and reasoning-oriented heads and regulates their contributions without retraining.
arXiv Detail & Related papers (2025-10-11T16:54:41Z)
Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations [73.37711261605271]
hallucination mitigation methods are mainly based on preference alignment and require external human annotations or auxiliary models for preference data collection.<n>We propose Autonomous Preference Alignment via Self-Injection (APASI), a novel and generalizable method that mitigates hallucinations without external dependencies.<n>APASI leverages the target LVLM to self-inject hallucinations into a generated response, creating a pair of responses with varying preference levels.
arXiv Detail & Related papers (2025-09-14T14:26:53Z)
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet [93.00109641811788]
Test-time scaling increases inference-time computation by allowing models to generate long reasoning chains.<n>We show that this approach is not yet effective for knowledge-intensive tasks, where high factual accuracy and low hallucination rates are essential.<n>Our results reveal that increasing test-time computation does not consistently improve accuracy and, in many cases, it even leads to more hallucinations.
arXiv Detail & Related papers (2025-09-08T16:28:25Z)
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models [63.98194996746229]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z)
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM [58.2298313720146]
Multimodal hallucinations are multi-sourced and arise from diverse causes.<n>Existing benchmarks fail to adequately distinguish between perception-induced hallucinations and reasoning-induced hallucinations.
arXiv Detail & Related papers (2025-05-30T05:54:36Z)
Are Reasoning Models More Prone to Hallucination? [70.04436965009072]
Recently evolved large reasoning models (LRMs) show powerful performance in solving complex tasks with long chain-of-thought (CoT) reasoning capability.<n>Are reasoning models more prone to hallucination?<n>This paper addresses the question from three perspectives.
arXiv Detail & Related papers (2025-05-29T16:53:41Z)
Auditing Meta-Cognitive Hallucinations in Reasoning Large Language Models [8.97308732968526]
We study the causality of hallucinations under constrained knowledge domains by auditing the Chain-of-Thought trajectory.<n>Our analysis reveals that in long-CoT settings, RLLMs can iteratively reinforce biases and errors through flawed reflective reasoning.<n>Surprisingly, even direct interventions at the origin of hallucinations often fail to reverse their effects.
arXiv Detail & Related papers (2025-05-19T14:11:09Z)
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective [11.013059864022667]
Reasoning Hallucinations are logically coherent but factually incorrect reasoning traces.<n>These errors are embedded within structured reasoning, making them more difficult to detect and potentially more harmful.<n>We propose the Reasoning Score, which quantifies the depth of reasoning by measuring the divergence between logits.<n>We also introduce GRPO-R, an enhanced reinforcement learning algorithm that incorporates step-level deep reasoning rewards via potential-based shaping.
arXiv Detail & Related papers (2025-05-19T09:16:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.