Related papers: Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA

Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA

URL: http://arxiv.org/abs/2509.01468v1
Date: Mon, 01 Sep 2025 13:37:42 GMT
Title: Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
Authors: Yuchen Wu, Liang Ding, Li Shen, Dacheng Tao,
Abstract summary: Reason-KE steers a pretrained large language model through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass.<n>Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and 1% when answers are leaked.
Score: 63.96040994220329
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) encode vast amounts of world knowledge but remain static once trained, making the timely integration of emerging facts prohibitively expensive via full retraining. Knowledge-editing techniques have thus emerged to inject or overwrite specific facts into LLMs, yet they either over-rely on superficial cues or incur complex, iterative pipelines that collapse under noisy, multi-hop conditions. We introduce Reason-KE, an end-to-end reasoning-chain-based editing framework that steers a pretrained LLM through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass. Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates Qwen2.5-7B's multi-hop QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and <1% when answers are leaked. Our quantitative analysis confirms Reason-KE's resilience and efficiency, establishing a new state-of-the-art for reliable LLM knowledge updates.

Related papers

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements [78.87065404966002]
Existing benchmarks predominantly curate questions at the question level.<n>We propose Encyclo-K, a statement-based benchmark that rethinks benchmark construction from the ground up.
arXiv Detail & Related papers (2025-12-31T13:55:54Z)
Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing [63.96040994220329]
We find that SFT-based methods, e.g., Reason-KE, suffer from a "faithfulness gap"<n>This gap enables the LLM's powerful parametric priors to override new contextual facts.<n>We propose Reason-KE++, an SFT+RL framework that instills process-level faithfulness.
arXiv Detail & Related papers (2025-11-16T15:49:01Z)
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs [18.37387666170851]
We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process.<n>ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain.<n>Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle.
arXiv Detail & Related papers (2025-10-01T13:10:36Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning [16.486529625382182]
We propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought reasoning. We show that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.
arXiv Detail & Related papers (2024-10-04T03:37:36Z)
Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning [18.283963879468466]
Large language models (LLMs) demonstrate remarkable capabilities but face challenges from hallucinations.<n>We introduce Uncertainty-and-Sensitivity-Aware Tuning (US-Tuning), a novel two-stage approach for contextual question answering.<n>Our experimental results demonstrate that US-Tuning not only significantly reduces incorrect answers in contextual QA but also improves models' faithfulness to their parametric knowledge.
arXiv Detail & Related papers (2024-06-14T14:56:04Z)
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering [14.389264346634507]
We propose EFSum, an Evidence-focused Fact Summarization framework for enhanced Quesetion Answering (QA) performance. Our experiments show that EFSum improves LLM's zero-shot QA performance.
arXiv Detail & Related papers (2024-03-05T13:43:58Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers. We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z)
DeepEdit: Knowledge Editing as Decoding with Constraints [118.78008395850888]
How to edit the knowledge in multi-step reasoning has become the major challenge in the knowledge editing (KE) of large language models (LLMs) We propose a new KE framework: DEEPEDIT, which enhances LLMs's ability to generate coherent reasoning chains with new knowledge through depth-first search. In addition to DEEPEDIT, we propose two new KE benchmarks: MQUAKE-2002 and MQUAKE-HARD, which provide more precise and challenging assessments of KE approaches.
arXiv Detail & Related papers (2024-01-19T03:48:27Z)
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models [122.19845578690466]
Step-Back Prompting enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution.
arXiv Detail & Related papers (2023-10-09T19:48:55Z)
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks. We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.