Fugu-MT 論文翻訳(概要): UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models

論文の概要: UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models

arxiv url: http://arxiv.org/abs/2505.15674v1
Date: Wed, 21 May 2025 15:53:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-22 15:42:59.754943
Title: UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models
Title（参考訳）: UniErase: 言語モデルのための普遍的消去プリミティブとしての学習トークン
Authors: Miao Yu, Liang Lin, Guibin Zhang, Xinfeng Li, Junfeng Fang, Ningyu Zhang, Kun Wang, Yang Wang,
Abstract要約: 学習可能なパラメトリック接尾辞(アンラーニングトークン)を用いて、ターゲットとなる忘れ行動に向けて言語モデルを操る新しいアンラーニングパラダイムであるUniEraseを紹介する。 UniEraseは、実世界の知識設定の下で、バッチ、シーケンシャル、そして正確なアンラーニングで、最先端のSOTA(State-of-the-art)パフォーマンスを達成する。
参考スコア（独自算出の注目度）: 54.75551043657238
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models require iterative updates to address challenges such as knowledge conflicts and outdated information (e.g., incorrect, private, or illegal contents). Machine unlearning provides a systematic methodology for targeted knowledge removal from trained models, enabling elimination of sensitive information influences. However, mainstream fine-tuning-based unlearning methods often fail to balance unlearning efficacy and model ability, frequently resulting in catastrophic model collapse under extensive knowledge removal. Meanwhile, in-context unlearning, which relies solely on contextual prompting without modifying the model's intrinsic mechanisms, suffers from limited generalizability and struggles to achieve true unlearning. In this work, we introduce UniErase, a novel unlearning paradigm that employs learnable parametric suffix (unlearning token) to steer language models toward targeted forgetting behaviors. UniErase operates through two key phases: (I) an optimization stage that binds desired unlearning outputs to the model's autoregressive probability distribution via token optimization, followed by (II) a lightweight model editing phase that activates the learned token to probabilistically induce specified forgetting objective. Serving as a new research direction for token learning to induce unlearning target, UniErase achieves state-of-the-art (SOTA) performance across batch, sequential, and precise unlearning under fictitious and real-world knowledge settings. Remarkably, in terms of TOFU benchmark, UniErase, modifying only around 3.66% of the LLM parameters, outperforms previous forgetting SOTA baseline by around 4.01 times for model ability with even better unlearning efficacy. Similarly, UniErase, maintaining more ability, also surpasses previous retaining SOTA by 35.96% for unlearning efficacy, showing dual top-tier performances in current unlearing domain.
Abstract（参考訳）: 大規模言語モデルは、知識の衝突や時代遅れの情報(例えば、不正、プライベート、違法な内容)といった問題に対処するために反復的な更新を必要とする。機械学習は、訓練されたモデルから標的となる知識を取り除くための体系的な方法論を提供する。しかし、主流の微調整に基づく未学習法は、未学習の有効性とモデル能力のバランスが取れず、大規模な知識除去の下で破滅的なモデル崩壊が起こることが多い。一方、インコンテキスト・アンラーニングは、モデル固有のメカニズムを変更することなく文脈的プロンプトのみに依存するが、限定的な一般化性と真のアンラーニングの実現に苦慮している。そこで本研究では,学習可能なパラメトリック接尾辞(アンラーニングトークン)を言語モデルに応用した,新たな未学習パラダイムであるUniEraseを紹介する。 UniErase は、(I) 所望の未学習出力をトークン最適化を介してモデルの自己回帰確率分布にバインドする最適化段階と、(II) 学習トークンを活性化し、特定の忘れる目的を確率的に誘導する軽量なモデル編集段階である。トークン学習の新たな研究方向として、UniEraseは、実世界の知識設定の下で、バッチ、シーケンシャル、そして正確な未学習における最先端のSOTA(State-of-the-art)パフォーマンスを達成する。注目すべきは、TOFUベンチマークの観点では、UniEraseはLLMパラメータの3.66%しか変更せず、以前のSOTAベースラインを4.01倍上回ったことだ。同様に、UniEraseはより能力を維持しており、未学習の有効性に対して従来のSOTAを35.96%上回っており、現在の未学習領域における2つのトップレベルのパフォーマンスを示している。

論文の概要: UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models

関連論文リスト