Fugu-MT 論文翻訳(概要): Mitigating Privacy Risk via Forget Set-Free Unlearning

論文の概要: Mitigating Privacy Risk via Forget Set-Free Unlearning

arxiv url: http://arxiv.org/abs/2604.10636v1
Date: Sun, 12 Apr 2026 13:24:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:16.142476
Title: Mitigating Privacy Risk via Forget Set-Free Unlearning
Title（参考訳）: 設定不要のアンラーニングによるプライバシリスクの軽減
Authors: Aviraj Newatia, Michael Cooper, Viet Nguyen, Rahul G. Krishnan,
Abstract要約: 機械学習モデルのトレーニングには、機密データやプライベートデータを含む大規模なデータセットの保存が必要である。本研究では,学習の補助的情報を活用する部分盲検アンラーニングを導入する。リロードは効率よく解き、スクラッチから再トレーニングされたモデルを近似し、セット依存のアプローチよりも優れていることを示す。
参考スコア（独自算出の注目度）: 11.615403193858503
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training machine learning models requires the storage of large datasets, which often contain sensitive or private data. Storing data is associated with a number of potential risks which increase over time, such as database breaches and malicious adversaries. Machine unlearning is the study of methods to efficiently remove the influence of training data subsets from previously-trained models. Existing unlearning methods typically require direct access to the "forget set" -- the data to be forgotten-and organisations must retain this data for unlearning rather than deleting it immediately upon request, increasing risks associated with the forget set. We introduce partially-blind unlearning -- utilizing auxiliary information to unlearn without explicit access to the forget set. We also propose a practical framework Reload, a partially-blind method based on gradient optimization and structured weight sparsification to operationalize partially-blind unlearning. We show that Reload efficiently unlearns, approximating models retrained from scratch, and outperforms several forget set-dependent approaches. On language models, Reload unlearns entities using <0.025% of the retain set and <7% of model weights in <8 minutes on Llama2-7B. In the corrective case, Reload achieves unlearning even when only 10% of corrupted data is identified.
Abstract（参考訳）: 機械学習モデルのトレーニングには、機密データやプライベートデータを含む大規模なデータセットの保存が必要である。データをストアすることは、データベースの侵入や悪意のある敵など、時間とともに増加する潜在的なリスクに関連付けられている。機械学習は、以前に訓練されたモデルからトレーニングデータサブセットの影響を効率的に除去する手法の研究である。既存の未学習のメソッドは、通常、"ターゲットセット"に直接アクセスする必要があります -- 忘れるべきデータと組織は、要求の直後に削除するのではなく、このデータをアンラーニング用に保持し、忘れセットに関連するリスクを増大させなければなりません。一部盲検アンラーニング(英語版)を導入し、補助情報を利用して、忘れ物セットに明示的にアクセスすることなく、未学習に活用する。また、勾配最適化と構造的重み空間化に基づく部分盲検学習を運用する部分盲検手法である実践的フレームワークReloadを提案する。リロードは効率よく解き、スクラッチから再トレーニングされたモデルを近似し、セット依存のアプローチよりも優れていることを示す。言語モデルでは、Reloadはretainセットの0.025%とモデルウェイトの7%を使って、Llama2-7Bで<8分で非学習エンティティをロードする。修正の場合、Reloadは、破損したデータの10%しか特定されていない場合でも、未学習を実現する。

論文の概要: Mitigating Privacy Risk via Forget Set-Free Unlearning

関連論文リスト