SAKE: Steering Activations for Knowledge Editing
- URL: http://arxiv.org/abs/2503.01751v1
- Date: Mon, 03 Mar 2025 17:20:29 GMT
- Title: SAKE: Steering Activations for Knowledge Editing
- Authors: Marco Scialanga, Thibault Laugel, Vincent Grari, Marcin Detyniecki,
- Abstract summary: We propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt.<n>Several numerical experiments demonstrate the effectiveness of this method.
- Score: 6.089774484591287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Langue Models have been shown to memorize real-world facts, the need to update this knowledge in a controlled and efficient manner arises. Designed with these constraints in mind, Knowledge Editing (KE) approaches propose to alter specific facts in pretrained models. However, they have been shown to suffer from several limitations, including their lack of contextual robustness and their failure to generalize to logical implications related to the fact. To overcome these issues, we propose SAKE, a steering activation method that models a fact to be edited as a distribution rather than a single prompt. Leveraging Optimal Transport, SAKE alters the LLM behavior over a whole fact-related distribution, defined as paraphrases and logical implications. Several numerical experiments demonstrate the effectiveness of this method: SAKE is thus able to perform more robust edits than its existing counterparts.
Related papers
- Are We Evaluating the Edit Locality of LLM Model Editing Properly? [68.441768731381]
We find that existing specificity evaluation protocols are inadequate for this purpose.<n>Existing specificity metrics are weakly correlated with the strength of specificity regularizers.<n>We also find that current metrics lack sufficient sensitivity, rendering them ineffective at distinguishing the specificity performance of different methods.
arXiv Detail & Related papers (2026-01-24T07:07:21Z) - An Information-Theoretic Framework for Robust Large Language Model Editing [17.984683741974063]
Large Language Models (LLMs) have become indispensable tools in science, technology, and society.<n>Errors or outdated information within these models can undermine their accuracy and restrict their safe deployment.<n>We introduce a novel framework for editing LLMs, grounded in information bottleneck theory.<n>We present the Information Bottleneck Knowledge Editor (IBKE), which leverages compact latent representations to guide gradient-based updates.
arXiv Detail & Related papers (2025-12-18T06:21:17Z) - ThinkEval: Practical Evaluation of Knowledge Leakage in LLM Editing using Thought-based Knowledge Graphs [3.9295613363026174]
We present ThinkEval, a framework to quantify indirect knowledge leakage and ripple effects in model-editing.<n>ThinkEval builds and employs specialized knowledge graphs to analyze the causal structure of facts before and after editing.<n>We evaluate five editing techniques: AlphaEdit, RECT, ROME, MEMIT, and PRUNE.
arXiv Detail & Related papers (2025-06-02T07:24:12Z) - InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing [77.47790551485721]
In-context learning is a promising editing method by comprehending edit information through context encoding.<n>This method is constrained by the limited context window of large language models.<n>We propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts.
arXiv Detail & Related papers (2025-05-28T09:20:18Z) - Mitigating Content Effects on Reasoning in Language Models through Fine-Grained Activation Steering [14.298418197820912]
Large language models (LLMs) frequently demonstrate reasoning limitations, often conflating content plausibility with logical validity.<n>This can result in biased inferences, where plausible arguments are incorrectly deemed logically valid or vice versa.<n>This paper investigates the problem of mitigating content biases on formal reasoning through activation steering.
arXiv Detail & Related papers (2025-05-18T01:34:34Z) - CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners [88.35958039968081]
CaKE (Circuit-aware Knowledge Editing) is a novel method that enables more effective integration of updated knowledge in large language models.
Results show that CaKE enables more accurate and consistent use of updated knowledge across related reasoning tasks.
arXiv Detail & Related papers (2025-03-20T17:14:34Z) - Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs [47.06544781855325]
We propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting success rates.
By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing.
arXiv Detail & Related papers (2025-03-03T01:30:28Z) - Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning [29.20378857521518]
Large language models (LLMs) have achieved remarkable performance on various natural language tasks.
They are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world.
Previous efforts often sought to update a small amount of parameters in some specific layer(s) of a LLM.
We propose BaFT to manage different types of knowledge in an adaptive way, thereby achieving a better editing-locality trade-off.
arXiv Detail & Related papers (2025-03-01T02:34:44Z) - Joint Localization and Activation Editing for Low-Resource Fine-Tuning [73.64004083269424]
We propose a joint localization and activation editing (JoLA) method.<n>JoLA learns (1) which heads in the Transformer to edit (2) whether the intervention should be additive, multiplicative, or both and (3) the intervention parameters themselves.<n>Through evaluations on three benchmarks spanning commonsense reasoning, natural language understanding, and natural language generation, we demonstrate that JoLA consistently outperforms existing methods.
arXiv Detail & Related papers (2025-02-03T09:13:09Z) - Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions.
Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv Detail & Related papers (2024-11-20T17:55:38Z) - Uncovering Overfitting in Large Language Model Editing [35.55260822503773]
We identify and investigate the phenomenon of Editing Overfit, where edited models assign disproportionately high probabilities to the edit target.
We propose a new plug-and-play strategy called Learn to Inference (LTI), which introduce a Multi-stage Inference Constraint module to guide the edited models in recalling new knowledge.
arXiv Detail & Related papers (2024-10-10T11:09:00Z) - FAME: Towards Factual Multi-Task Model Editing [4.858226284963096]
Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks.
We present FAME, an factual, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing.
We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world.
arXiv Detail & Related papers (2024-10-07T13:46:06Z) - How Well Can Knowledge Edit Methods Edit Perplexing Knowledge? [18.022428746019582]
Large language models (LLMs) have demonstrated remarkable capabilities, but updating their knowledge post-training remains a critical challenge.<n>We introduce the concept of perplexingness'': the degree to which new knowledge conflicts with an LLM's learned conceptual hierarchies and categorical relationships.<n>Our analysis reveals that edits involving more abstract concepts (hypernyms) generally exhibit higher perplexingness and are more resistant to modification than their specific counterparts (hyponyms)
arXiv Detail & Related papers (2024-06-25T03:41:02Z) - EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries [69.72012539060731]
We introduce a theoretical framework for efficient knowledge editing (KE) in large language models (LLMs)
We propose a novel task of event-based knowledge editing that pairs facts with event descriptions.
We empirically demonstrate the superiority of event-based editing over the existing setting on resolving uncertainty in edited models.
arXiv Detail & Related papers (2024-02-17T16:34:50Z) - KnowTuning: Knowledge-aware Fine-tuning for Large Language Models [83.5849717262019]
We propose a knowledge-aware fine-tuning (KnowTuning) method to improve fine-grained and coarse-grained knowledge awareness of LLMs.
KnowTuning generates more facts with less factual error rate under fine-grained facts evaluation.
arXiv Detail & Related papers (2024-02-17T02:54:32Z) - Evaluating Dependencies in Fact Editing for Language Models: Specificity
and Implication Awareness [26.589633375359647]
We aim to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge.
Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should apply to its lexical variations without disrupting irrelevant ones.
We propose an evaluation protocol with an accompanying question-answering dataset, DepEdit, that provides a comprehensive assessment of the editing process.
arXiv Detail & Related papers (2023-12-04T12:45:30Z) - Can LMs Learn New Entities from Descriptions? Challenges in Propagating
Injected Knowledge [72.63368052592004]
We study LMs' abilities to make inferences based on injected facts (or propagate those facts)
We find that existing methods for updating knowledge show little propagation of injected knowledge.
Yet, prepending entity definitions in an LM's context improves performance across all settings.
arXiv Detail & Related papers (2023-05-02T17:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.