Fugu-MT 論文翻訳(概要): Feature-Selective Representation Misdirection for Machine Unlearning

論文の概要: Feature-Selective Representation Misdirection for Machine Unlearning

arxiv url: http://arxiv.org/abs/2512.16297v1
Date: Thu, 18 Dec 2025 08:31:50 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-19 18:10:31.982578
Title: Feature-Selective Representation Misdirection for Machine Unlearning
Title（参考訳）: 機械学習における特徴選択的表現ミスダイレクト
Authors: Taozhao Chen, Linghan Huang, Kim-Kwang Raymond Choo, Huaming Chen,
Abstract要約: マシンアンラーニングは、デプロイされたモデルが進化する法律、安全、ガバナンス要件に準拠することを確実にするのに役立つ。現在の未学習のテクニックは、データセットの忘れと保持のクリーンな分離を前提としている。本稿では,アクティベーション編集フレームワークSRMU(Selective Representation Misdirection for Unlearning)を提案する。
参考スコア（独自算出の注目度）: 34.167873590478074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) are increasingly adopted in safety-critical and regulated sectors, the retention of sensitive or prohibited knowledge introduces escalating risks, ranging from privacy leakage to regulatory non-compliance to to potential misuse, and so on. Recent studies suggest that machine unlearning can help ensure deployed models comply with evolving legal, safety, and governance requirements. However, current unlearning techniques assume clean separation between forget and retain datasets, which is challenging in operational settings characterized by highly entangled distributions. In such scenarios, perturbation-based methods often degrade general model utility or fail to ensure safety. To address this, we propose Selective Representation Misdirection for Unlearning (SRMU), a novel principled activation-editing framework that enforces feature-aware and directionally controlled perturbations. Unlike indiscriminate model weights perturbations, SRMU employs a structured misdirection vector with an activation importance map. The goal is to allow SRMU selectively suppresses harmful representations while preserving the utility on benign ones. Experiments are conducted on the widely used WMDP benchmark across low- and high-entanglement configurations. Empirical results reveal that SRMU delivers state-of-the-art unlearning performance with minimal utility losses, and remains effective under 20-30\% overlap where existing baselines collapse. SRMU provides a robust foundation for safety-driven model governance, privacy compliance, and controlled knowledge removal in the emerging LLM-based applications. We release the replication package at https://figshare.com/s/d5931192a8824de26aff.
Abstract（参考訳）: 大規模言語モデル(LLM)は、安全クリティカルで規制されたセクターでますます採用されているため、機密性や禁止された知識の保持は、プライバシーの漏洩から規制違反、潜在的な誤用に至るまで、リスクのエスカレーションをもたらす。最近の研究は、機械学習が、展開されたモデルが進化する法律、安全、ガバナンス要件に準拠することを確実にするのに役立つことを示唆している。しかし、現在のアンラーニング技術は、データセットを忘れたり保持したりすることの明確な分離を前提としており、高度に絡み合った分布を特徴とする運用環境では困難である。このようなシナリオでは、摂動に基づく手法は一般的なモデルユーティリティを劣化させるか、安全性を確保するのに失敗することが多い。そこで本稿では,特徴認識と方向制御による摂動を強制する,新たな活性化編集フレームワークであるSelective Representation Misdirection for Unlearning(SRMU)を提案する。無差別モデル重み摂動とは異なり、SRMUはアクティベーション重要度マップを持つ構造的ミス指向ベクトルを用いる。 SRMUの目標は、有害な表現を選択的に抑制し、良性のある表現に対して有効性を維持することである。広範に使われているWMDPベンチマークにおいて、低絡みと高絡みの2つの構成で実験を行った。実証的な結果から、SRMUは最先端の未学習のパフォーマンスを最小限のユーティリティ損失で提供し、既存のベースラインが崩壊した場合の20～30倍のオーバーラップの下でも有効であることが明らかとなった。 SRMUは、新たなLLMベースのアプリケーションにおいて、安全駆動モデルガバナンス、プライバシコンプライアンス、および制御された知識除去のための堅牢な基盤を提供する。レプリケーションパッケージはhttps://figshare.com/s/d5931192a8824de26affでリリースしています。

論文の概要: Feature-Selective Representation Misdirection for Machine Unlearning

関連論文リスト