Related papers: Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

URL: http://arxiv.org/abs/2601.15495v1
Date: Wed, 21 Jan 2026 21:56:35 GMT
Title: Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
Authors: Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut,
Abstract summary: We introduce TRACK (Testing Reasoning Amid Conflicting Knowledge), a new benchmark for studying how LLMs propagate new knowledge through multi-step reasoning.<n>Our results reveal that providing updated facts to models for reasoning can worsen performance compared to providing no updated facts to a model.<n>We show this failure stems from both inability to faithfully integrate updated facts, but also flawed reasoning even when knowledge is integrated.
Score: 26.769199929372956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common solution for mitigating outdated or incorrect information in Large Language Models (LLMs) is to provide updated facts in-context or through knowledge editing. However, these methods introduce knowledge conflicts when the knowledge update fails to overwrite the model's parametric knowledge, which propagate to faulty reasoning. Current benchmarks for this problem, however, largely focus only on single knowledge updates and fact recall without evaluating how these updates affect downstream reasoning. In this work, we introduce TRACK (Testing Reasoning Amid Conflicting Knowledge), a new benchmark for studying how LLMs propagate new knowledge through multi-step reasoning when it conflicts with the model's initial parametric knowledge. Spanning three reasoning-intensive scenarios (WIKI, CODE, and MATH), TRACK introduces multiple, realistic conflicts to mirror real-world complexity. Our results on TRACK reveal that providing updated facts to models for reasoning can worsen performance compared to providing no updated facts to a model, and that this performance degradation exacerbates as more updated facts are provided. We show this failure stems from both inability to faithfully integrate updated facts, but also flawed reasoning even when knowledge is integrated. TRACK provides a rigorous new benchmark to measure and guide future progress on propagating conflicting knowledge in multi-step reasoning.

Related papers

Model Merging for Knowledge Editing [53.799891745131724]
Large Language Models (LLMs) require continuous updates to maintain accurate and current knowledge as the world evolves.<n>Existing knowledge editing approaches offer various solutions for knowledge updating, but they often struggle with sequential editing scenarios.<n>This paper proposes a two-stage framework combining robust supervised fine-tuning (R-SFT) with model merging for knowledge editing.
arXiv Detail & Related papers (2025-06-14T07:42:39Z)
FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation [37.28571879699906]
Large language models (LLMs) augmented with retrieval systems have demonstrated significant potential in handling knowledge-intensive tasks.<n>This paper proposes FaithfulRAG, a novel framework that resolves knowledge conflicts by explicitly modeling discrepancies between the models parametric knowledge and retrieved context.
arXiv Detail & Related papers (2025-06-10T16:02:54Z)
Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing [12.5122702720856]
In-context editing (ICE) offers a lightweight solution by injecting new knowledge directly into the input context.<n>Existing ICE approaches do not explicitly separate the newly injected knowledge from the model's original reasoning process.<n>We propose DecKER, a novel ICE framework that decouples reasoning from knowledge editing by generating a masked reasoning path.
arXiv Detail & Related papers (2025-05-31T12:51:12Z)
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners [109.87058236007907]
CaKE (Circuit-aware Knowledge Editing) is a novel method that enhances the effective integration of updated knowledge in large language models.<n> Experiments show that CaKE enables more accurate and consistent use of edited knowledge across related reasoning tasks.
arXiv Detail & Related papers (2025-03-20T17:14:34Z)
Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning [38.018263569983226]
We provide an evaluation of ten model editing methods along four dimensions: reliability, generalization, locality, and portability.<n>We then propose a straightforward method called Selective Contextual Reasoning (SCR) for knowledge updating.
arXiv Detail & Related papers (2025-03-07T08:04:25Z)
CoME: An Unlearning-based Approach to Conflict-free Model Editing [8.215201299292033]
Large language models (LLMs) often retain outdated or incorrect information from pre-training, which undermines their reliability.<n>We propose Conflict-free Model Editing (CoME), a novel framework that enhances the accuracy of knowledge updates in LLMs by selectively removing outdated knowledge.
arXiv Detail & Related papers (2025-02-20T04:55:38Z)
Studying Large Language Model Behaviors Under Context-Memory Conflicts With Real Documents [54.953320616069654]
Retrieval-augmented generation mitigates many problems of fully parametric language models. In RAG, the model's knowledge can be updated from documents provided in context. We present a framework for studying such knowledge conflicts in a realistic setup.
arXiv Detail & Related papers (2024-04-24T17:59:36Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
A Comprehensive Study of Knowledge Editing for Large Language Models [82.65729336401027]
Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. This paper defines the knowledge editing problem and provides a comprehensive review of cutting-edge approaches. We introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches.
arXiv Detail & Related papers (2024-01-02T16:54:58Z)
R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning) Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z)
Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts) We find that existing methods for updating knowledge show little propagation of injected knowledge. Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.