Fugu-MT 論文翻訳(概要): Quantifying Edits Decay in Fine-tuned LLMs

論文の概要: Quantifying Edits Decay in Fine-tuned LLMs

arxiv url: http://arxiv.org/abs/2511.05852v1
Date: Sat, 08 Nov 2025 04:58:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-11 21:18:44.617834
Title: Quantifying Edits Decay in Fine-tuned LLMs
Title（参考訳）: 微調整LDMにおける編集劣化の定量化
Authors: Yinjie Cheng, Paul Youssef, Christin Seifert, Jörg Schlötterer, Zhixue Zhao,
Abstract要約: 本研究では,微調整が知識編集に与える影響について検討する。我々は,2つの最先端編集手法(MEMIT,AlphaEdit)と3つの微調整手法を評価した。以上の結果から,微調整後に編集が崩壊し,生存は構成によって異なることが明らかとなった。
参考スコア（独自算出の注目度）: 17.377278510871843
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge editing has emerged as a lightweight alternative to retraining for correcting or injecting specific facts in large language models (LLMs). Meanwhile, fine-tuning remains the default operation for adapting LLMs to new domains and tasks. Despite their widespread adoption, these two post-training interventions have been studied in isolation, leaving open a crucial question: if we fine-tune an edited model, do the edits survive? This question is motivated by two practical scenarios: removing covert or malicious edits, and preserving beneficial edits. If fine-tuning impairs edits as shown in Figure 1, current KE methods become less useful, as every fine-tuned model would require re-editing, which significantly increases the cost; if edits persist, fine-tuned models risk propagating hidden malicious edits, raising serious safety concerns. To this end, we systematically quantify edits decay after fine-tuning, investigating how fine-tuning affects knowledge editing. We evaluate two state-of-the-art editing methods (MEMIT, AlphaEdit) and three fine-tuning approaches (full-parameter, LoRA, DoRA) across five LLMs and three datasets, yielding 232 experimental configurations. Our results show that edits decay after fine-tuning, with survival varying across configurations, e.g., AlphaEdit edits decay more than MEMIT edits. Further, we propose selective-layer fine-tuning and find that fine-tuning edited layers only can effectively remove edits, though at a slight cost to downstream performance. Surprisingly, fine-tuning non-edited layers impairs more edits than full fine-tuning. Overall, our study establishes empirical baselines and actionable strategies for integrating knowledge editing with fine-tuning, and underscores that evaluating model editing requires considering the full LLM application pipeline.
Abstract（参考訳）: 知識編集は、大きな言語モデル(LLM)で特定の事実を修正または注入するためのリトレーニングに代わる軽量な代替手段として登場した。一方、微調整はLLMを新しいドメインやタスクに適応するためのデフォルトの操作である。広く採用されているにもかかわらず、これらの2つのトレーニング後の介入は別々に研究され、重要な疑問が残る。この質問は、隠蔽または悪意のある編集を削除し、有益な編集を保存するという2つの実践的なシナリオによって動機付けられている。微調整の障害が図1に示すように編集される場合、現在のKEメソッドは、すべての微調整のモデルが再編集を必要とするため、有用性が低下する。そこで我々は,微調整後の編集劣化が知識編集にどのように影響するかを,体系的に定量化する。我々は,2つの最先端編集手法 (MEMIT, AlphaEdit) と3つの微調整手法 (全パラメータ, LoRA, DoRA) を5つのLLMと3つのデータセットで評価し,232個の実験結果を得た。以上の結果から,微調整後の編集が劣化し,例えばAlphaEditはMEMIT編集よりも劣化することがわかった。さらに、選択層微調整法を提案し、微調整層は、ダウンストリーム性能に若干のコストをかけた編集を効果的に除去できるのみであることを示す。驚いたことに、微調整されていないレイヤーは完全な微調整よりも多くの編集を損なう。全体として,本研究は,知識編集と微調整を統合するための経験的ベースラインと実行可能な戦略を確立し,モデル編集を評価するためには完全なLLMアプリケーションパイプラインを考慮する必要があることを強調した。

論文の概要: Quantifying Edits Decay in Fine-tuned LLMs

関連論文リスト