Internal Representations as Indicators of Hallucinations in Agent Tool Selection
- URL: http://arxiv.org/abs/2601.05214v1
- Date: Thu, 08 Jan 2026 18:38:45 GMT
- Title: Internal Representations as Indicators of Hallucinations in Agent Tool Selection
- Authors: Kait Healy, Bharathi Srinivasan, Visakh Madathil, Jing Wu,
- Abstract summary: Large Language Models (LLMs) have shown remarkable capabilities in tool calling and tool usage.<n>LLMs suffer from hallucinations where they choose incorrect tools, provide malformed parameters and exhibit 'tool bypass' behavior.<n>We present a computationally efficient framework that detects tool-calling hallucinations in real-time by leveraging LLMs' internal representations.
- Score: 5.2107604548805915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown remarkable capabilities in tool calling and tool usage, but suffer from hallucinations where they choose incorrect tools, provide malformed parameters and exhibit 'tool bypass' behavior by performing simulations and generating outputs instead of invoking specialized tools or external systems. This undermines the reliability of LLM based agents in production systems as it leads to inconsistent results, and bypasses security and audit controls. Such hallucinations in agent tool selection require early detection and error handling. Unlike existing hallucination detection methods that require multiple forward passes or external validation, we present a computationally efficient framework that detects tool-calling hallucinations in real-time by leveraging LLMs' internal representations during the same forward pass used for generation. We evaluate this approach on reasoning tasks across multiple domains, demonstrating strong detection performance (up to 86.4\% accuracy) while maintaining real-time inference capabilities with minimal computational overhead, particularly excelling at detecting parameter-level hallucinations and inappropriate tool selections, critical for reliable agent deployment.
Related papers
- RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection [3.494935876363005]
Existing agentic systems implicitly assume that the tools they invoke always return correct outputs, making them vulnerable to erroneous tool responses.<n>We introduce RIVA, a novel multi-agent system that performs robust IaC verification even when tools produce incorrect or misleading outputs.<n>Our results show that cross-validation of diverse tool calls enables more reliable autonomous infrastructure verification in production cloud environments.
arXiv Detail & Related papers (2026-03-02T19:28:27Z) - ForgeryVCR: Visual-Centric Reasoning via Efficient Forensic Tools in MLLMs for Image Forgery Detection and Localization [62.03035862528452]
ForgeryVCR is a framework that materializes imperceptible traces into explicit visual intermediates via Visual-Centric Reasoning.<n>ForgeryVCR achieves state-of-the-art (SOTA) performance in both detection and localization tasks.
arXiv Detail & Related papers (2026-02-15T11:14:47Z) - The Bitter Lesson of Diffusion Language Models for Agentic Workflows: A Comprehensive Reality Check [54.08619694620588]
We present a comprehensive evaluation of dLLMs across two distinct agentic paradigms: Embodied Agents and Tool-Calling Agents.<n>Our results on Agentboard and BFCL reveal a "bitter lesson": current dLLMs fail to serve as reliable agentic backbones.
arXiv Detail & Related papers (2026-01-19T11:45:39Z) - ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration [68.89572566071575]
ETAgent is a training framework for calibrating agent's tool-use behavior.<n>It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors.
arXiv Detail & Related papers (2026-01-11T11:05:26Z) - Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders [39.5490415037017]
Retrieval-Augmented Generation (RAG) improves the factuality of large language models (LLMs) by grounding outputs in retrieved evidence.<n>Existing hallucination detection methods for RAG often rely on large-scale detector training.<n>We introduce RAGLens, a lightweight hallucination detector that accurately flags unfaithful RAG outputs.
arXiv Detail & Related papers (2025-12-09T18:33:22Z) - How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench [58.114899897566964]
In a multi-turn conversational environment, large language models (LLMs) often struggle with consistent reasoning and adherence to domain-specific policies.<n>We propose the Input-Reformulation Multi-Agent (IRMA) framework, which automatically reformulates user queries augmented with relevant domain rules.<n>IRMA significantly outperforms ReAct, Function Calling, and Self-Reflection by 16.1%, 12.7%, and 19.1%, respectively.
arXiv Detail & Related papers (2025-08-28T15:57:33Z) - More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents [24.84276066855418]
This study investigates whether agents are vulnerable to errors throughout the entire tool invocation process.<n>We observe that agents are highly susceptible to errors at each stage and agents based on open-source models are more vulnerable than those based on proprietary models.
arXiv Detail & Related papers (2025-06-27T07:13:29Z) - Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation [78.78421340836915]
We systematically investigate reference-free hallucination detection in open-domain long-form responses.<n>Our findings reveal that internal states are insufficient for reliably distinguishing between factual and hallucinated content.<n>We introduce a new paradigm, named RATE-FT, that augments fine-tuning with an auxiliary task for the model to jointly learn with the main task of hallucination detection.
arXiv Detail & Related papers (2025-05-18T07:10:03Z) - Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger [49.81945268343162]
We propose MeCo, an adaptive decision-making strategy for external tool use.<n>MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space.<n>MeCo is fine-tuning-free and incurs minimal cost.
arXiv Detail & Related papers (2025-02-18T15:45:01Z) - Reducing Tool Hallucination via Reliability Alignment [31.761771794788462]
Large Language Models (LLMs) have expanded their capabilities beyond language generation to interact with external tools, enabling automation and real-world applications.<n>Tool hallucinations, where models either select inappropriate tools or misuse them, pose significant challenges, leading to erroneous task execution, increased computational costs, and reduced system reliability.<n>We introduce RelyToolBench, which integrates specialized test cases and novel metrics to assess hallucination-aware task success and efficiency.<n>Finally, we propose Relign, a reliability alignment framework that expands the tool-use action space to include indecisive actions, allowing LLMs to defer tool use, seek clarification, or adjust tool selection
arXiv Detail & Related papers (2024-12-05T13:10:54Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.<n>We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.<n>We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.