Related papers: The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

URL: http://arxiv.org/abs/2402.09656v4
Date: Wed, 5 Jun 2024 09:43:00 GMT
Title: The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse
Authors: Wanli Yang, Fei Sun, Xinyu Ma, Xun Liu, Dawei Yin, Xueqi Cheng,
Abstract summary: Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive. We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
Score: 58.0132400208411
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating changes in an edited model's perplexity are strongly correlated with its downstream task performances. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.

Related papers

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing [77.47790551485721]
In-context learning is a promising editing method by comprehending edit information through context encoding.<n>This method is constrained by the limited context window of large language models.<n>We propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts.
arXiv Detail & Related papers (2025-05-28T09:20:18Z)
The Mirage of Model Editing: Revisiting Evaluation in the Wild [70.17413507444704]
We study the effectiveness of model editing in question answering applications. Our single editing experiments indicate that current editing methods perform substantially worse than previously reported. Our analysis provides a fundamental reexamination of both the real-world applicability of existing model editing methods and their evaluation practices.
arXiv Detail & Related papers (2025-02-16T15:57:55Z)
Reasons and Solutions for the Decline in Model Performance after Editing [17.756172082400163]
This paper explores the reasons for the performance decline of the edited model and optimize the editing method. The performance of the editing model is mainly affected by the diversity of editing targets and sequence length. In order to improve the performance of the editing model, this paper proposes a Dump for Sequence (D4S) method.
arXiv Detail & Related papers (2024-10-31T11:49:44Z)
FAME: Towards Factual Multi-Task Model Editing [4.858226284963096]
Large language models (LLMs) embed extensive knowledge and utilize it to perform exceptionally well across various tasks. We present FAME, an factual, comprehensive, and multi-task dataset, which is designed to enhance the practicality of model editing. We then propose SKEME, a model editing method that uses a novel caching mechanism to ensure synchronization with the real world.
arXiv Detail & Related papers (2024-10-07T13:46:06Z)
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA [55.697627106315004]
Large language models (LLMs) require model editing to efficiently update specific knowledge within them and avoid factual errors. Previous approaches manage sequential edits by freezing original parameters and discretely allocating new parameters for each knowledge update. We propose ELDER, a novel approach to create a continuous association between data and adapters.
arXiv Detail & Related papers (2024-08-19T02:27:00Z)
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? [61.68363765350178]
This paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent.
arXiv Detail & Related papers (2024-06-27T17:33:03Z)
Perturbation-Restrained Sequential Model Editing [33.51709226068619]
Current model editing methods compromise the general abilities of large language models (LLMs) as the number of edits increases. We propose a framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) PRUNE can preserve considerable general abilities while maintaining the editing performance effectively in sequential model editing.
arXiv Detail & Related papers (2024-05-27T04:40:56Z)
Consecutive Batch Model Editing with HooK Layers [59.673084839708224]
CoachHooK is a model editing method that simultaneously supports sequential and batch editing. It is memory-friendly as it only needs a small amount of it to store several hook layers whose size remains unchanged over time.
arXiv Detail & Related papers (2024-03-08T14:07:44Z)
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue [122.20016030723043]
We evaluate the side effects of model editing on large language models (LLMs) Our analysis reveals that the side effects are caused by model editing altering the original model weights excessively. To mitigate this, a method named RECT is proposed to regularize the edit update weights.
arXiv Detail & Related papers (2024-01-09T18:03:15Z)
Editing Large Language Models: Problems, Methods, and Opportunities [51.903537096207]
This paper embarks on a deep exploration of the problems, methods, and opportunities related to model editing for LLMs. We provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. Our objective is to provide valuable insights into the effectiveness and feasibility of each editing technique, thereby assisting the community in making informed decisions on the selection of the most appropriate method for a specific task or context.
arXiv Detail & Related papers (2023-05-22T16:00:00Z)
Edit at your own risk: evaluating the robustness of edited models to distribution shifts [0.0]
We investigate how model editing affects the general robustness of a model, as well as the robustness of the specific behavior targeted by the edit. We find that edits tend to reduce general robustness, but that the degree of degradation depends on the editing algorithm and layers chosen. Motivated by these observations we introduce a new model editing algorithm, 1-layer (1-LI), which uses weight-space to navigate the trade-off between editing task accuracy and general robustness.
arXiv Detail & Related papers (2023-02-28T19:41:37Z)
Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope. We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC) SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.