Investigating Tool-Memory Conflicts in Tool-Augmented LLMs
- URL: http://arxiv.org/abs/2601.09760v1
- Date: Wed, 14 Jan 2026 03:44:36 GMT
- Title: Investigating Tool-Memory Conflicts in Tool-Augmented LLMs
- Authors: Jiali Cheng, Rui Pan, Hadi Amiri,
- Abstract summary: We propose a new type of knowledge conflict -- Tool-Memory Conflict (TMC)<n>TMC is where the internal parametric knowledge contradicts with the external tool knowledge for tool-augmented LLMs.
- Score: 20.760278842357625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tool-augmented large language models (LLMs) have powered many applications. However, they are likely to suffer from knowledge conflict. In this paper, we propose a new type of knowledge conflict -- Tool-Memory Conflict (TMC), where the internal parametric knowledge contradicts with the external tool knowledge for tool-augmented LLMs. We find that existing LLMs, though powerful, suffer from TMC, especially on STEM-related tasks. We also uncover that under different conditions, tool knowledge and parametric knowledge may be prioritized differently. We then evaluate existing conflict resolving techniques, including prompting-based and RAG-based methods. Results show that none of these approaches can effectively resolve tool-memory conflicts.
Related papers
- It's LIT! Reliability-Optimized LLMs with Inspectable Tools [33.53798264548128]
Large language models (LLMs) have exhibited remarkable capabilities across various domains.<n>LLMs often follow an opaque reasoning process, which limits their usefulness in high-stakes domains.<n>We present a framework built on the tool-calling capabilities of existing LLMs to enable them to select the most reliable and easy-to-troubleshoot solution path.
arXiv Detail & Related papers (2025-11-18T20:41:58Z) - Alignment for Efficient Tool Calling of Large Language Models [34.748897353548756]
Large language models (LLMs) can integrate external tools, enhancing their task performance by expanding their knowledge boundaries.<n>However, relying on tools often introduces tradeoffs between performance, speed, and cost.<n>This paper addresses the challenge of aligning LLMs with their knowledge boundaries to make more intelligent decisions about tool invocation.
arXiv Detail & Related papers (2025-03-09T17:55:49Z) - Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering [23.96385393039587]
Large language models (LLMs) can store a significant amount of factual knowledge in their parameters.<n>LLMs can internally register the signals of knowledge conflict at mid-layers.<n>We propose textscSpARE, a representation engineering method that uses pre-trained sparse auto-encoders.
arXiv Detail & Related papers (2024-10-21T13:30:47Z) - Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents [56.822238860147024]
Augmenting large language models with external tools has emerged as a promising approach to extend their utility.<n>Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.<n>We propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models [51.72963030032491]
Knowledge documents for large language models (LLMs) may conflict with the memory of LLMs due to outdated or incorrect knowledge.
We construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering.
arXiv Detail & Related papers (2024-04-04T16:40:11Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [63.08202389132155]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.<n>There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.<n>We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use [79.87054552116443]
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities.<n>We introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools.<n>We conduct experiments involving eight popular LLMs and find that the majority of them still struggle to effectively select tools.
arXiv Detail & Related papers (2023-10-04T19:39:26Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z) - Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large
Language Models in Knowledge Conflicts [21.34852490049787]
We present the first comprehensive and controlled investigation into the behavior of large language models (LLMs) when encountering knowledge conflicts.
We find that LLMs can be highly receptive to external evidence even when that conflicts with their parametric memory.
On the other hand, LLMs also demonstrate a strong confirmation bias when the external evidence contains some information consistent with their parametric memory.
arXiv Detail & Related papers (2023-05-22T17:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.