Fugu-MT 論文翻訳(概要): Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation

論文の概要: Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation

arxiv url: http://arxiv.org/abs/2508.07075v1
Date: Sat, 09 Aug 2025 18:48:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.685247
Title: Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation
Title（参考訳）: コンパクトLLMにおける外科的知識の書き直し : 局所的なFactual ModulationとCaastrophic Forgetting Mitigationのための'Unlearn-then-Learn'戦略(IA^3$)
Authors: Stanley Ngugi,
Abstract要約: 本稿では,大規模言語モデルにおける正確な知識編集のための新しい「未学習学習戦略」を紹介し,評価する。 2段階のアプローチは、競合する事実を符号化する原因となる特定の内部コンポーネントを特定し、ターゲットとする初期回路ローカライゼーションフェーズによって実現される。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) struggle with dynamic knowledge updates, especially when new information conflicts with deeply embedded facts. Such conflicting factual edits often lead to two critical issues: resistance to adopting the new fact and severe catastrophic forgetting of unrelated knowledge. This paper introduces and evaluates a novel "unlearn-then-learn" strategy for precise knowledge editing in LLMs, leveraging the parameter-efficient fine-tuning (PEFT) technique, Infused Adapter by Inhibiting and Amplifying Inner Activations ($IA^3$). Crucially, this two-stage approach is powered by an initial circuit localization phase that identifies and targets the specific internal components responsible for encoding the conflicting fact. Through a rigorous experimental methodology on microsoft/Phi-3-mini-4k-instruct, we demonstrate that this mechanistically informed two-stage approach achieves near-perfect accuracy (98.50%) for the new, modulated fact while simultaneously effectively suppressing the original conflicting fact (96.00% forget rate). Critically, our strategy exhibits unprecedented localization (72.00% F_control accuracy), dramatically mitigating catastrophic forgetting observed in direct fine-tuning approaches (which showed as low as ~20% F_control accuracy), a direct benefit of our targeted interpretability-guided intervention. Furthermore, qualitative analysis reveals a nuanced mechanism of "soft forgetting," where original knowledge is suppressed from default retrieval but remains latent and conditionally accessible, enhancing model safety and control. These findings represent a significant advancement towards precise, localized, and safe knowledge management in compact LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、特に新しい情報が深く埋め込まれた事実と矛盾する場合、動的知識更新に苦しむ。このような矛盾する事実編集は、新しい事実を採用することへの抵抗と、無関係な知識の破滅的な忘れという2つの重大な問題を引き起こすことが多い。本稿では, パラメータ効率細調整(PEFT)技術, Infused Adapter by Inhibiting and Amplifying Inner Activations(IA^3$)を活用した, LLMの正確な知識編集のための新しい「未学習学習」戦略を紹介し, 評価する。重要なことに、この2段階のアプローチは、競合する事実を符号化する原因となる特定の内部コンポーネントを特定し、ターゲットとする初期回路のローカライゼーションフェーズによって実現されている。マイクロソフト/Phi-3-mini-4k-インストラクタに関する厳密な実験手法により、このメカニカルに通知された2段階アプローチが、新しい変調事実に対してほぼ完璧な精度(98.50%)を達成し、同時に元の矛盾する事実(96.00%の忘れ率)を効果的に抑制することを示した。極めて重要なことは,我々の戦略は前例のない局所化(72.00% F_control の精度)を示し,直接微調整アプローチ(20% F_control の精度が低い)で観察された破滅的忘れを劇的に軽減している。さらに、定性的な分析により、デフォルトの検索からオリジナルの知識が抑制されるが、遅延性があり、条件付きでアクセス可能であり、モデルの安全性と制御が向上する「ソフト・ナッシング」のニュアンスなメカニズムが明らかにされる。これらの知見は,コンパクトLLMにおける高精度で局所的で安全な知識管理に向けた重要な進歩を示すものである。

論文の概要: Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation

関連論文リスト