Fugu-MT 論文翻訳(概要): Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

論文の概要: Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

arxiv url: http://arxiv.org/abs/2601.15495v1
Date: Wed, 21 Jan 2026 21:56:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-23 21:37:20.428998
Title: Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
Title（参考訳）: 知識伝達限界の追跡:多段階推論におけるLLMの失敗と知識の衝突
Authors: Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut,
Abstract要約: TRACK(Testing Reasoning Amid Conflicting Knowledge)は、LLMがマルチステップ推論によって新しい知識をどのように伝播するかを研究するための新しいベンチマークである。この結果から,モデルに更新事実を付与するよりも,モデルに更新事実を付与する方が性能を悪化させる可能性が示唆された。この失敗は、更新された事実を忠実に統合できないことと、知識が統合されたとしても推論に欠陥があることから生じることを示している。
参考スコア（独自算出の注目度）: 26.769199929372956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common solution for mitigating outdated or incorrect information in Large Language Models (LLMs) is to provide updated facts in-context or through knowledge editing. However, these methods introduce knowledge conflicts when the knowledge update fails to overwrite the model's parametric knowledge, which propagate to faulty reasoning. Current benchmarks for this problem, however, largely focus only on single knowledge updates and fact recall without evaluating how these updates affect downstream reasoning. In this work, we introduce TRACK (Testing Reasoning Amid Conflicting Knowledge), a new benchmark for studying how LLMs propagate new knowledge through multi-step reasoning when it conflicts with the model's initial parametric knowledge. Spanning three reasoning-intensive scenarios (WIKI, CODE, and MATH), TRACK introduces multiple, realistic conflicts to mirror real-world complexity. Our results on TRACK reveal that providing updated facts to models for reasoning can worsen performance compared to providing no updated facts to a model, and that this performance degradation exacerbates as more updated facts are provided. We show this failure stems from both inability to faithfully integrate updated facts, but also flawed reasoning even when knowledge is integrated. TRACK provides a rigorous new benchmark to measure and guide future progress on propagating conflicting knowledge in multi-step reasoning.
Abstract（参考訳）: LLM(Large Language Models)において、時代遅れまたは誤った情報を緩和するための一般的な解決策は、コンテキスト内または知識編集を通じて更新された事実を提供することである。しかし、これらの手法は、知識更新がモデルのパラメトリック知識を上書きできない場合に知識の衝突を引き起こす。しかし、この問題に対する現在のベンチマークは、主に、ダウンストリームの推論にどのように影響するかを評価することなく、単一の知識更新と事実リコールのみに焦点を当てている。本研究では,LLMがモデルの初期パラメトリック知識と矛盾する場合に,多段階の推論を通じて新たな知識を伝播させる方法を研究するための新しいベンチマークであるTRACK(Testing Reasoning Amid Conflicting Knowledge)を紹介する。 3つの推論集約シナリオ(WIKI、CODE、MATH)を拡大することで、TRACKは現実の複雑さを反映する複数の現実的な競合を導入する。 TRACKを用いた結果から,モデルに更新された事実を提示するよりも,モデルに更新された事実を提示する方が,モデルに更新された事実を提示する方がパフォーマンスを悪化させる可能性が示唆された。この失敗は、更新された事実を忠実に統合できないことと、知識が統合されたとしても推論に欠陥があることから生じることを示している。 TRACKは、多段階推論における矛盾する知識の伝播に関する将来の進歩を計測し、ガイドするための厳格な新しいベンチマークを提供する。

論文の概要: Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

関連論文リスト