Fugu-MT 論文翻訳(概要): Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

論文の概要: Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

arxiv url: http://arxiv.org/abs/2605.00364v1
Date: Fri, 01 May 2026 02:59:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 17:43:28.830068
Title: Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning
Title（参考訳）: 重要なことの学習:精密言語モデル学習におけるトークンレベル属性
Authors: Jiawei Wu, DouDou Zhou,
Abstract要約: TokenUnlearnはトークンレベルの属性フレームワークで、クリティカルトークンを特定し、選択的にターゲットする。提案手法は,マスキングによる知識認識信号とエントロピー認識信号を組み合わせて,正確なトークン選択のための重要スコアを得る。
参考スコア（独自算出の注目度）: 5.454773103061359
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine unlearning has emerged as a critical capability for addressing privacy, safety, and regulatory concerns in large language models (LLMs). Existing methods operate at the sequence level, applying uniform updates across all tokens despite only a subset encoding the knowledge targeted for removal. This introduces gradient noise, degrades utility, and leads to suboptimal forgetting. We propose TokenUnlearn, a token-level attribution framework that identifies and selectively targets critical tokens. Our approach combines knowledge-aware signals via masking, and entropy-aware signals to yield importance scores for precise token selection. We develop two complementary strategies: hard selection, applying unlearning only to high-importance tokens, and soft weighting, modulating gradient contributions based on importance scores. Both extend existing methods to token-level variants. Theoretical analysis shows token-level selection improves gradient signal-to-noise ratio. Experiments on TOFU and WMDP benchmarks across three model architectures demonstrate consistent improvements over sequence-level baselines in both forgetting effectiveness and utility preservation.
Abstract（参考訳）: 機械学習は、大きな言語モデル(LLM)のプライバシ、安全性、規制問題に対処するための重要な機能として登場した。既存のメソッドはシーケンスレベルで動作し、削除対象の知識を符号化するサブセットに留まらず、すべてのトークンを均一に更新する。これは勾配ノイズを導入し、実用性を低下させ、最適下界を忘れる原因となる。本稿ではトークンレベルの属性フレームワークであるTokenUnlearnを提案する。提案手法は,マスキングによる知識認識信号とエントロピー認識信号を組み合わせて,正確なトークン選択のための重要スコアを得る。ハードセレクション, 重要トークンのみに未学習を適用すること, ソフト重み付け, 重要スコアに基づく勾配寄与の調整という2つの相補的戦略を開発した。どちらも既存のメソッドをトークンレベルの変種に拡張している。理論的解析により、トークンレベルの選択は勾配信号-雑音比を改善する。 3つのモデルアーキテクチャにわたるTOFUとWMDPベンチマークの実験は、効率と実用性の両方を忘れることにおいて、シーケンスレベルのベースラインよりも一貫した改善を示す。

論文の概要: Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

関連論文リスト