Quantifying Edits Decay in Fine-tuned LLMs
- URL: http://arxiv.org/abs/2511.05852v2
- Date: Thu, 13 Nov 2025 01:34:45 GMT
- Title: Quantifying Edits Decay in Fine-tuned LLMs
- Authors: Yinjie Cheng, Paul Youssef, Christin Seifert, Jörg Schlötterer, Zhixue Zhao,
- Abstract summary: This study investigates how fine-tuning affects knowledge editing.<n>We evaluate two state-of-the-art editing methods (MEMIT, AlphaEdit) and three fine-tuning approaches.<n>Our results show that edits decay after fine-tuning, with survival varying across configurations.
- Score: 17.377278510871843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge editing has emerged as a lightweight alternative to retraining for correcting or injecting specific facts in large language models (LLMs). Meanwhile, fine-tuning remains the default operation for adapting LLMs to new domains and tasks. Despite their widespread adoption, these two post-training interventions have been studied in isolation, leaving open a crucial question: if we fine-tune an edited model, do the edits survive? This question is motivated by two practical scenarios: removing covert or malicious edits, and preserving beneficial edits. If fine-tuning impairs edits as shown in Figure 1, current KE methods become less useful, as every fine-tuned model would require re-editing, which significantly increases the cost; if edits persist, fine-tuned models risk propagating hidden malicious edits, raising serious safety concerns. To this end, we systematically quantify edits decay after fine-tuning, investigating how fine-tuning affects knowledge editing. We evaluate two state-of-the-art editing methods (MEMIT, AlphaEdit) and three fine-tuning approaches (full-parameter, LoRA, DoRA) across five LLMs and three datasets, yielding 232 experimental configurations. Our results show that edits decay after fine-tuning, with survival varying across configurations, e.g., AlphaEdit edits decay more than MEMIT edits. Further, we propose selective-layer fine-tuning and find that fine-tuning edited layers only can effectively remove edits, though at a slight cost to downstream performance. Surprisingly, fine-tuning non-edited layers impairs more edits than full fine-tuning. Overall, our study establishes empirical baselines and actionable strategies for integrating knowledge editing with fine-tuning, and underscores that evaluating model editing requires considering the full LLM application pipeline.
Related papers
- How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models [7.342540592387184]
We investigate the interaction between model editing and fine-tuning in the context of T2I diffusion models.<n>Our findings reveal a trend: edits generally fail to persist through fine-tuning, even when fine-tuning is tangential or unrelated to the edits.<n>These findings highlight the need for more robust techniques to ensure reliable long-term control and alignment of deployed AI systems.
arXiv Detail & Related papers (2025-06-23T09:10:29Z) - MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs [76.28901550926021]
Existing methods for lifelong model editing compromise generalization, interfere with past edits, or fail to scale to long editing sequences.<n>We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory, while preserving the core capabilities of the pre-trained model.<n>MeMOIR achieves state-of-the-art performance across reliability, generalization, and locality metrics, scaling to thousands of sequential edits with minimal forgetting.
arXiv Detail & Related papers (2025-06-09T16:16:42Z) - Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing [10.54738347540608]
Large Language Models (LLMs) are widely deployed in downstream tasks, but keeping their knowledge up-to-date via retraining or fine-tuning is often computationally expensive.<n>Model editing provides a more efficient alternative by updating a targeted subset of parameters, which often follows the locate-and-edit paradigm.<n>We propose two complementary methods: iterative model editing, which applies successive edits to mitigate UnderEdit, and neighbor-assisted model editing, which incorporates neighboring knowledge during editing to reduce OverEdit.
arXiv Detail & Related papers (2025-03-14T21:53:12Z) - Constraining Sequential Model Editing with Editing Anchor Compression [40.93064933191375]
Large language models (LLMs) struggle with hallucinations due to false or outdated knowledge.<n>This paper statistically observes that the parameter matrix after editing exhibits a significant deviation compared to its previous state as the number of edits increases.<n>A framework termed Editing Anchor Compression (EAC) is proposed to constrain the deviation of the parameter matrix during sequential editing.
arXiv Detail & Related papers (2025-02-25T03:56:49Z) - AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models [63.209935157623946]
Large language models (LLMs) often exhibit hallucinations due to incorrect or outdated knowledge.<n>We introduce AlphaEdit, a novel solution that projects perturbation onto the null space of the preserved knowledge before applying it to the parameters.<n>We theoretically prove that this projection ensures the output of post-edited LLMs remains unchanged when queried about the preserved knowledge.
arXiv Detail & Related papers (2024-10-03T10:06:27Z) - Perturbation-Restrained Sequential Model Editing [33.51709226068619]
Current model editing methods compromise the general abilities of large language models (LLMs) as the number of edits increases.<n>A framework termed Perturbation Restraint on Upper bouNd for Editing (PRUNE) is proposed, which applies the condition number restraints in sequential editing.<n>The results show that PRUNE can preserve general abilities while maintaining the editing performance effectively in sequential model editing.
arXiv Detail & Related papers (2024-05-27T04:40:56Z) - The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse [58.0132400208411]
Even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks.
benchmarking Large Language Models after each edit is impractically time-consuming and resource-intensive.
We have utilized GPT-3.5 to develop a new dataset, HardEdit, based on hard cases.
arXiv Detail & Related papers (2024-02-15T01:50:38Z) - Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue [122.20016030723043]
We evaluate the side effects of model editing on large language models (LLMs)
Our analysis reveals that the side effects are caused by model editing altering the original model weights excessively.
To mitigate this, a method named RECT is proposed to regularize the edit update weights.
arXiv Detail & Related papers (2024-01-09T18:03:15Z) - Object-aware Inversion and Reassembly for Image Editing [61.19822563737121]
We propose Object-aware Inversion and Reassembly (OIR) to enable object-level fine-grained editing.
We use our search metric to find the optimal inversion step for each editing pair when editing an image.
Our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.
arXiv Detail & Related papers (2023-10-18T17:59:02Z) - Memory-Based Model Editing at Scale [102.28475739907498]
Existing model editors struggle to accurately model an edit's intended scope.
We propose Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model (SERAC)
SERAC stores edits in an explicit memory and learns to reason over them to modulate the base model's predictions as needed.
arXiv Detail & Related papers (2022-06-13T23:40:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.