RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning
- URL: http://arxiv.org/abs/2410.03122v1
- Date: Fri, 4 Oct 2024 03:37:36 GMT
- Title: RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning
- Authors: Zihao Zhao, Yuchen Yang, Yijiang Li, Yinzhi Cao,
- Abstract summary: We propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought reasoning.
We show that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.
- Score: 16.486529625382182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ripple effect poses a significant challenge in knowledge editing for large language models. Namely, when a single fact is edited, the model struggles to accurately update the related facts in a sequence, which is evaluated by multi-hop questions linked to a chain of related facts. Recent strategies have moved away from traditional parameter updates to more flexible, less computation-intensive methods, proven to be more effective in addressing the ripple effect. In-context learning (ICL) editing uses a simple demonstration `Imagine that + new fact` to guide LLMs, but struggles with complex multi-hop questions as the new fact alone fails to specify the chain of facts involved in such scenarios. Besides, memory-based editing maintains additional storage for all edits and related facts, requiring continuous updates to stay effective. As a result of these design limitations, the challenge remains, with the highest accuracy being only 33.8% on the MQuAKE-cf benchmarks for Vicuna-7B. To address this, we propose RippleCOT, a novel ICL editing approach integrating Chain-of-Thought (COT) reasoning. RippleCOT structures demonstrations as `newfact, question, thought, answer`, incorporating a thought component to identify and decompose the multi-hop logic within questions. This approach effectively guides the model through complex multi-hop questions with chains of related facts. Comprehensive experiments demonstrate that RippleCOT significantly outperforms the state-of-the-art on the ripple effect, achieving accuracy gains ranging from 7.8% to 87.1%.
Related papers
- Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z) - Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA [63.96040994220329]
Reason-KE steers a pretrained large language model through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass.<n>Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and 1% when answers are leaked.
arXiv Detail & Related papers (2025-09-01T13:37:42Z) - SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought [8.287063165175667]
Chain-of-Thought (CoT) reasoning incurs significant time costs due to the generation of discrete CoT tokens (DCoT)<n>Existing Continuous CoT methods are hampered by indirect fine-tuning, limited alignment, or inconsistent targets.<n>We propose textitSynAdapt, an innovative efficient reasoning framework.
arXiv Detail & Related papers (2025-08-01T12:17:35Z) - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
We introduce Sketch-of-Thought (SoT), a novel prompting framework.
It combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage.
SoT achieves token reductions of 76% with negligible accuracy impact.
arXiv Detail & Related papers (2025-03-07T06:57:17Z) - SAKE: Steering Activations for Knowledge Editing [6.089774484591287]
We propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt.
Several numerical experiments demonstrate the effectiveness of this method.
arXiv Detail & Related papers (2025-03-03T17:20:29Z) - Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization [49.95746521480879]
We introduce PKUE (Precise Knowledge Utilization Enhancement), which fine-tunes the model on self-generated responses to precise and simple factual questions.<n>Extensive experiments demonstrate that PKUE significantly improves LLM overall performance.
arXiv Detail & Related papers (2025-02-26T13:34:52Z) - AnyEdit: Edit Any Knowledge Encoded in Language Models [69.30638272162267]
We propose AnyEdit, a new autoregressive editing paradigm for large language models (LLMs)
It decomposes long-form knowledge into sequential chunks and iteratively edits the key token in each chunk, ensuring consistent and accurate outputs.
It outperforms strong baselines by 21.5% on benchmarks including UnKEBench, AKEW, and our new EditEverything dataset for long-form diverse-formatted knowledge.
arXiv Detail & Related papers (2025-02-08T16:18:37Z) - Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs [10.449165630417522]
We construct two complex fact-checking datasets in the Chinese scenarios: CHEF-EG and TrendFact.
These datasets involve complex facts in areas such as health, politics, and society.
We propose a unified framework called FactISR to perform mutual feedback between veracity and explanations.
arXiv Detail & Related papers (2024-10-19T15:25:19Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments [35.3938477255058]
This paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo)
GMeLLo merges the explicit knowledge representation of Knowledge Graphs with the linguistic flexibility of Large Language Models.
Our results show that GMeLLo significantly surpasses current state-of-the-art knowledge editing methods in the multi-hop question answering benchmark, MQuAKE.
arXiv Detail & Related papers (2024-08-28T16:15:45Z) - Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering [47.199078631274745]
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge.
We propose the Retrieval-Augmented model Editing (RAE) framework for multi-hop question answering.
arXiv Detail & Related papers (2024-03-28T17:47:19Z) - On the Robustness of Editing Large Language Models [57.477943944826904]
Large language models (LLMs) have played a pivotal role in building communicative AI, yet they encounter the challenge of efficient updates.
This work seeks to understand the strengths and limitations of editing methods, facilitating practical applications of communicative AI.
arXiv Detail & Related papers (2024-02-08T17:06:45Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - PokeMQA: Programmable knowledge editing for Multi-hop Question Answering [46.80110170981976]
Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities.
We propose a framework, Programmable knowledge editing for Multi-hop Question Answering (MQA)
Specifically, we prompt LLMs to decompose knowledge-augmented multi-hop question, while interacting with a detached trainable scope detector to modulate LLMs behavior depending on external conflict signal.
arXiv Detail & Related papers (2023-12-23T08:32:13Z) - EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data
Augmentation for Multi-hop Fact Verification [28.453817513380276]
We develop a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals.
In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture.
Experimental results show that the proposed approach outperforms the SOTA baselines.
arXiv Detail & Related papers (2023-10-23T02:39:14Z) - MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions [75.21713251369225]
We present a benchmark, MQuAKE, comprising multi-hop questions that assess whether edited models correctly answer questions.
We propose a memory-based approach, MeLLo, which stores all edited facts externally while prompting the language model iteratively to generate answers consistent with the edited facts.
arXiv Detail & Related papers (2023-05-24T06:48:41Z) - RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities.
RCoT automatically detects and rectifys factual inconsistency in generated solutions.
We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.