Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
- URL: http://arxiv.org/abs/2509.01468v1
- Date: Mon, 01 Sep 2025 13:37:42 GMT
- Title: Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
- Authors: Yuchen Wu, Liang Ding, Li Shen, Dacheng Tao,
- Abstract summary: Reason-KE steers a pretrained large language model through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass.<n>Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and 1% when answers are leaked.
- Score: 63.96040994220329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) encode vast amounts of world knowledge but remain static once trained, making the timely integration of emerging facts prohibitively expensive via full retraining. Knowledge-editing techniques have thus emerged to inject or overwrite specific facts into LLMs, yet they either over-rely on superficial cues or incur complex, iterative pipelines that collapse under noisy, multi-hop conditions. We introduce Reason-KE, an end-to-end reasoning-chain-based editing framework that steers a pretrained LLM through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass. Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates Qwen2.5-7B's multi-hop QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and <1% when answers are leaked. Our quantitative analysis confirms Reason-KE's resilience and efficiency, establishing a new state-of-the-art for reliable LLM knowledge updates.
Related papers
- Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements [78.87065404966002]
Existing benchmarks predominantly curate questions at the question level.<n>We propose Encyclo-K, a statement-based benchmark that rethinks benchmark construction from the ground up.
arXiv Detail & Related papers (2025-12-31T13:55:54Z) - Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing [63.96040994220329]
We find that SFT-based methods, e.g., Reason-KE, suffer from a "faithfulness gap"<n>This gap enables the LLM's powerful parametric priors to override new contextual facts.<n>We propose Reason-KE++, an SFT+RL framework that instills process-level faithfulness.
arXiv Detail & Related papers (2025-11-16T15:49:01Z) - Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs [18.37387666170851]
We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process.<n>ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain.<n>Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle.
arXiv Detail & Related papers (2025-10-01T13:10:36Z) - GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning [16.486529625382182]
We propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought reasoning.
We show that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.
arXiv Detail & Related papers (2024-10-04T03:37:36Z) - Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning [18.283963879468466]
Large language models (LLMs) demonstrate remarkable capabilities but face challenges from hallucinations.<n>We introduce Uncertainty-and-Sensitivity-Aware Tuning (US-Tuning), a novel two-stage approach for contextual question answering.<n>Our experimental results demonstrate that US-Tuning not only significantly reduces incorrect answers in contextual QA but also improves models' faithfulness to their parametric knowledge.
arXiv Detail & Related papers (2024-06-14T14:56:04Z) - Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering [14.389264346634507]
We propose EFSum, an Evidence-focused Fact Summarization framework for enhanced Quesetion Answering (QA) performance.
Our experiments show that EFSum improves LLM's zero-shot QA performance.
arXiv Detail & Related papers (2024-03-05T13:43:58Z) - Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers.
We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z) - DeepEdit: Knowledge Editing as Decoding with Constraints [118.78008395850888]
How to edit the knowledge in multi-step reasoning has become the major challenge in the knowledge editing (KE) of large language models (LLMs)
We propose a new KE framework: DEEPEDIT, which enhances LLMs's ability to generate coherent reasoning chains with new knowledge through depth-first search.
In addition to DEEPEDIT, we propose two new KE benchmarks: MQUAKE-2002 and MQUAKE-HARD, which provide more precise and challenging assessments of KE approaches.
arXiv Detail & Related papers (2024-01-19T03:48:27Z) - Take a Step Back: Evoking Reasoning via Abstraction in Large Language
Models [122.19845578690466]
Step-Back Prompting enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details.
Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution.
arXiv Detail & Related papers (2023-10-09T19:48:55Z) - Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for
Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks.
We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.