Fugu-MT 論文翻訳(概要): Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation

論文の概要: Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation

arxiv url: http://arxiv.org/abs/2510.00625v1
Date: Wed, 01 Oct 2025 07:59:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.452485
Title: Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation
Title（参考訳）: モデル編集は砂の上に構築されるのか?
Authors: Wei Liu, Haomei Xu, Bingqing Liu, Zhiying Deng, Haozhao Wang, Jun Wang, Ruixuan Li, Yee Whye Teh, Wee Sun Lee,
Abstract要約: 大きな言語モデル(LLM)は、必然的に時代遅れまたは誤った知識をエンコードする。そのような知識の更新、削除、そして忘れは、アライメント、安全性、その他の問題にとって重要である。この問題を解決するために、モデル編集は有望なパラダイムとして現れ、特定の事実が更新され、他の知識を保持しながら、パラメータの小さなサブセットを正確に編集する。前回の論文で大きな成功を収めたにもかかわらず、編集の信頼性は脆弱な基盤にかかっていることが判明した。我々の経験的証拠は、編集が完全な意味論よりもショートカットに基づく可能性が高いことを示し、さらなる進歩の前にモデル編集の基盤を急激な再考を求める。
参考スコア（独自算出の注目度）: 50.40861036534546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) inevitably encode outdated or incorrect knowledge. Updating, deleting, and forgetting such knowledge is important for alignment, safety, and other issues. To address this issue, model editing has emerged as a promising paradigm: by precisely editing a small subset of parameters such that a specific fact is updated while preserving other knowledge. Despite its great success reported in previous papers, we find the apparent reliability of editing rests on a fragile foundation and the current literature is largely driven by illusory success. The fundamental goal of steering the model's output toward a target with minimal modification would encourage exploiting hidden shortcuts, rather than utilizing real semantics. This problem directly challenges the feasibility of the current model editing literature at its very foundation, as shortcuts are inherently at odds with robust knowledge integration. Coincidentally, this issue has long been obscured by evaluation frameworks that lack the design of negative examples. To uncover it, we systematically develop a suite of new evaluation methods. Strikingly, we find that state-of-the-art approaches collapse even under the simplest negation queries. Our empirical evidence shows that editing is likely to be based on shortcuts rather than full semantics, calling for an urgent reconsideration of the very basis of model editing before further advancements can be meaningfully pursued.
Abstract（参考訳）: 大きな言語モデル(LLM)は、必然的に時代遅れまたは誤った知識を符号化する。このような知識の更新、削除、忘れは、アライメント、安全性、その他の問題にとって重要です。この問題を解決するために、モデル編集は有望なパラダイムとして現れ、特定の事実が更新され、他の知識を保持しながら、パラメータの小さなサブセットを正確に編集する。以前の論文で報告された大きな成功にもかかわらず、編集の信頼性は脆弱な基礎の上で明らかであり、現在の文献は概ね幻想的な成功によって引き起こされている。最小限の修正でターゲットに向けてモデルの出力をステアリングするという基本的な目標は、実際のセマンティクスを活用するのではなく、隠れたショートカットを活用することである。ショートカットは本質的には堅牢な知識統合と相反するので、この問題は現在のモデル編集文学の基盤における実現可能性に直接挑戦する。偶然にも、この問題は、否定的な例の設計を欠いている評価フレームワークによって、長い間曖昧にされてきた。そこで我々は,新しい評価手法を体系的に開発する。興味深いことに、最も単純な否定クエリの下でも、最先端のアプローチは崩壊する。我々の実証的な証拠は、編集が完全な意味論よりもショートカットに基づく可能性が高いことを示しており、さらなる進歩が追求される前に、モデル編集の基盤を緊急に再考することを要求している。

論文の概要: Is Model Editing Built on Sand? Revealing Its Illusory Success and Fragile Foundation

関連論文リスト