Fugu-MT 論文翻訳(概要): RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

論文の概要: RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

arxiv url: http://arxiv.org/abs/2604.12820v1
Date: Tue, 14 Apr 2026 14:44:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.510349
Title: RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
Title（参考訳）: RePAIR: Prompt-Awareモデル修復による対話型機械学習
Authors: Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla,
Abstract要約: 大規模言語モデル(LLM)は、大規模ウェブコーパスでの事前学習において、有害な知識、誤情報、個人データを本質的に吸収する。対話型機械学習(Interactive Machine Unlearning, IMU)は,LLMに推論時に自然言語で目的とする知識を忘れるように指示する新しいパラダイムである。 RePAIR は (i) 学習意図検出のための監視犬モデル, (ii) 修復手順を生成する外科医モデル, (iii) パラメータを自律的に更新する患者モデルから構成される。
参考スコア（独自算出の注目度）: 1.7118181664522618
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data. We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising (i) a watchdog model for unlearning intent detection, (ii) a surgeon model for generating repair procedures, and (iii) a patient model whose parameters are updated autonomously. At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines. Extensive experiments across harmful knowledge suppression, misinformation correction, and personal data erasure demonstrate that RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while preserving model utility (Acc_r up to 84.47, R-RL up to 0.88), outperforming six state-of-the-art baselines. These results establish RePAIR as an effective and practical framework for user-driven model editing, advancing transparent and on-device control over learned knowledge, with potential extensions to multimodal foundation models.
Abstract（参考訳）: 大規模言語モデル(LLM)は、大規模Webコーパスでの事前学習において、有害な知識、誤情報、個人データを本質的に吸収する。機械学習は原則化されたソリューションを提供するが、既存のアプローチはプロバイダ中心であり、パイプラインの再トレーニング、保持データセットのキュレーション、モデルサービスプロバイダ(MSP)による直接的な介入を必要とする。対話型機械学習(Interactive Machine Unlearning, IMU)は,LLMに推論時に自然言語で目的とする知識を忘れるように指示する新しいパラダイムである。 IMUを実現するために,我々はRePAIRを提案する。一学習意図検出のための監視犬モデル二修理処置を作成するための外科医モデル、及び三パラメータを自律的に更新した患者モデル In the core of RePAIR, we developed Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations to a refusal subspace via closed-form pseudoinverse updates。低ランクの変種は計算複雑性をO(d^3)からO(r^3 + r^2 * d)に減らし、トレーニングベースラインの最大3倍の高速化でデバイス上での学習を効率化する。有害な知識抑制、誤情報訂正、個人データ消去に関する広範な実験により、RePAIRは6つの最先端ベースラインを上回り(Acc_f = 0.00, F-RL = 0.00)、モデルユーティリティ(Acc_r から84.47, R-RL から 0.88 まで)を維持しながら、ほぼゼロに近い忘れスコア(Acc_f = 0.00, F-RL = 0.00)を達成した。これらの結果は、RePAIRをユーザ主導のモデル編集のための効果的で実践的なフレームワークとして確立し、学習知識を透過的かつオンデバイス的に制御し、マルチモーダル基盤モデルに拡張する可能性を秘めている。

論文の概要: RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

関連論文リスト