Fugu-MT 論文翻訳(概要): RepSelect: Robust LLM Unlearning via Representation Selectivity

論文の概要: RepSelect: Robust LLM Unlearning via Representation Selectivity

arxiv url: http://arxiv.org/abs/2606.17168v2
Date: Sat, 20 Jun 2026 10:01:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.216174
Title: RepSelect: Robust LLM Unlearning via Representation Selectivity
Title（参考訳）: RepSelect: 表現選択によるロバストLLM学習
Authors: Filip Sondej, Yushi Yang, Adam Mahdi,
Abstract要約: 大規模言語モデル(LLM)は、一般的な能力を犠牲にすることなく、特定の知識と価値を深く忘れる。本稿では,各更新に先立って重み勾配の主成分を折り畳むことで,レプトセット固有の表現を分離するRepSelectを提案する。 RepSelectは、最強のベースラインよりも4～50倍の学習後回答精度の削減を実現し、ほとんど確実に数発の攻撃に対して堅牢である。
参考スコア（独自算出の注目度）: 2.6963769910722046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. Current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. Existing methods target representations shared with both the retain set and the subspace recovered by a fine-tuning attacker, making unlearning both disruptive to general capabilities and easy to reverse. We propose RepSelect (Representation Selectivity), which isolates forget-set-specific representations by collapsing top principal components of weight gradients before each update, leaving general capabilities intact while limiting what fine-tuning can recover. We evaluate across two forget categories, biohazardous knowledge and abusive tendencies, and four model families spanning dense and Mixture-of-Experts architectures (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite). Compared to five popular baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL), RepSelect achieves a 4-50x larger reduction in post-relearning answer accuracy than the strongest baseline, and is near-perfectly robust to few-shot prompting attacks. Targeting selective representations is thus an important step towards deep and robust LLM forgetting.
Abstract（参考訳）: 大きな言語モデル(LLM)を一般的な能力を犠牲にすることなく、特定の知識や価値を深く忘れることは、未学習の重要な課題である。現在の手法は微調整や数発のプロンプトによって容易に逆転し、忘れるのは浅いだけであることを示唆している。私たちは根本原因を特定します。既存のメソッドは、retainセットとサブスペースの両方で共有される表現をターゲットとし、微調整アタッカーによって復元される。 RepSelect(Representation Selectivity)を提案する。これは、各更新前に重み勾配の主成分を折り畳み、微調整で回復できるものを制限することなく、一般的な機能をそのまま残すことによって、スリープセット固有の表現を分離する。バイオハザード・ナレッジと乱用傾向の2つのカテゴリ,および密集と混在する4つのモデルファミリー(Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite)について評価を行った。一般的な5つのベースライン(GradDiff、NPO、SimNPO、RMU、UNDIAL)と比較して、RepSelectは、最強のベースラインよりも4～50倍の学習後の回答精度の削減を実現し、数発の攻撃に対してほぼ完璧に堅牢である。したがって、選択表現をターゲットとすることは、深く堅牢なLLMの忘れ方への重要なステップである。

論文の概要: RepSelect: Robust LLM Unlearning via Representation Selectivity

関連論文リスト