Does Localization Inform Editing? Surprising Differences in
Causality-Based Localization vs. Knowledge Editing in Language Models
- URL: http://arxiv.org/abs/2301.04213v2
- Date: Mon, 16 Oct 2023 17:42:58 GMT
- Title: Does Localization Inform Editing? Surprising Differences in
Causality-Based Localization vs. Knowledge Editing in Language Models
- Authors: Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
- Abstract summary: We find that we can change how a fact is stored in a model by editing weights that are in a different location than where existing methods suggest that the fact is stored.
This is surprising because we would expect that localizing facts to specific model parameters would tell us where to manipulate knowledge in models.
Our results suggest, counterintuitively, that better mechanistic understanding of how pretrained language models work may not always translate to insights about how to best change their behavior.
- Score: 68.03946716358335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models learn a great quantity of factual information during
pretraining, and recent work localizes this information to specific model
weights like mid-layer MLP weights. In this paper, we find that we can change
how a fact is stored in a model by editing weights that are in a different
location than where existing methods suggest that the fact is stored. This is
surprising because we would expect that localizing facts to specific model
parameters would tell us where to manipulate knowledge in models, and this
assumption has motivated past work on model editing methods. Specifically, we
show that localization conclusions from representation denoising (also known as
Causal Tracing) do not provide any insight into which model MLP layer would be
best to edit in order to override an existing stored fact with a new one. This
finding raises questions about how past work relies on Causal Tracing to select
which model layers to edit. Next, we consider several variants of the editing
problem, including erasing and amplifying facts. For one of our editing
problems, editing performance does relate to localization results from
representation denoising, but we find that which layer we edit is a far better
predictor of performance. Our results suggest, counterintuitively, that better
mechanistic understanding of how pretrained language models work may not always
translate to insights about how to best change their behavior. Our code is
available at https://github.com/google/belief-localization
Related papers
- Should We Really Edit Language Models? On the Evaluation of Edited Language Models [15.63231238452797]
Existing editing methods lead to inevitable performance deterioration on general benchmarks.
Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing.
Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models.
arXiv Detail & Related papers (2024-10-24T14:36:48Z) - Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research.
We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place.
Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z) - WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [78.22291694903659]
Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses.
Where the updated knowledge resides in memories is a fundamental question for model editing.
We propose WISE to bridge the gap between memories.
arXiv Detail & Related papers (2024-05-23T16:35:52Z) - On Mechanistic Knowledge Localization in Text-to-Image Generative Models [44.208804082687294]
We introduce the concept of Mechanistic Localization in text-to-image models.
We measure the direct effect of intermediate layers to output generation by performing interventions in the cross-attention layers of the UNet.
We employ LocoEdit, a fast closed-form editing method across popular open-source text-to-image models.
arXiv Detail & Related papers (2024-05-02T05:19:05Z) - "Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models [17.77377809345631]
We investigate how model editing methods unexpectedly amplify model biases post-edit.
Specifically, we focus on biases with respect to demographic attributes such as race, geographic origin, and gender.
We find that edited models exhibit, to various degrees, more biased behavior as they become less confident in attributes for Asian, African, and South American subjects.
arXiv Detail & Related papers (2024-02-29T23:11:55Z) - Aging with GRACE: Lifelong Model Editing with Discrete Key-Value
Adaptors [53.819805242367345]
We propose GRACE, a lifelong model editing method, which implements spot-fixes on streaming errors of a deployed model.
GRACE writes new mappings into a pre-trained model's latent space, creating a discrete, local codebook of edits without altering model weights.
Our experiments on T5, BERT, and GPT models show GRACE's state-of-the-art performance in making and retaining edits, while generalizing to unseen inputs.
arXiv Detail & Related papers (2022-11-20T17:18:22Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z) - Editing Factual Knowledge in Language Models [51.947280241185]
We present KnowledgeEditor, a method that can be used to edit this knowledge.
Besides being computationally efficient, KnowledgeEditor does not require any modifications in LM pre-training.
We show KnowledgeEditor's efficacy with two popular architectures and knowledge-intensive tasks.
arXiv Detail & Related papers (2021-04-16T15:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.