Fugu-MT 論文翻訳(概要): ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models

論文の概要: ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models

arxiv url: http://arxiv.org/abs/2602.00350v1
Date: Fri, 30 Jan 2026 21:56:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:33.130069
Title: ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models
Title（参考訳）: ReLAPSe:Reinforcement-Learning-Learning-trained Adversarial Prompt Search for Erased concept in unlearned diffusion model
Authors: Ignacy Kolton, Kacper Marzol, Paweł Batorski, Marcin Mazur, Paul Swoboda, Przemysław Spurek,
Abstract要約: マシン・アンラーニングは、テキスト・ツー・イメージ拡散モデルから不正な概念を取り除くための鍵となる防御メカニズムである。この漏洩を悪用する既存の敵のアプローチは、基本的な制限によって制約される。本稿では,ReLAPSeについて紹介する。ReLAPSeは,概念回復を強化学習問題として再編成する政策ベースの敵対的枠組みである。
参考スコア（独自算出の注目度）: 12.021923446217722
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine unlearning is a key defense mechanism for removing unauthorized concepts from text-to-image diffusion models, yet recent evidence shows that latent visual information often persists after unlearning. Existing adversarial approaches for exploiting this leakage are constrained by fundamental limitations: optimization-based methods are computationally expensive due to per-instance iterative search. At the same time, reasoning-based and heuristic techniques lack direct feedback from the target model's latent visual representations. To address these challenges, we introduce ReLAPSe, a policy-based adversarial framework that reformulates concept restoration as a reinforcement learning problem. ReLAPSe trains an agent using Reinforcement Learning with Verifiable Rewards (RLVR), leveraging the diffusion model's noise prediction loss as a model-intrinsic and verifiable feedback signal. This closed-loop design directly aligns textual prompt manipulation with latent visual residuals, enabling the agent to learn transferable restoration strategies rather than optimizing isolated prompts. By pioneering the shift from per-instance optimization to global policy learning, ReLAPSe achieves efficient, near-real-time recovery of fine-grained identities and styles across multiple state-of-the-art unlearning methods, providing a scalable tool for rigorous red-teaming of unlearned diffusion models. Some experimental evaluations involve sensitive visual concepts, such as nudity. Code is available at https://github.com/gmum/ReLaPSe
Abstract（参考訳）: マシン・アンラーニングは、テキスト・ツー・イメージ拡散モデルから不正な概念を取り除くための鍵となる防御メカニズムである。最適化に基づく手法は、インスタンスごとの反復探索によって計算コストがかかる。同時に、推論に基づくヒューリスティックな手法は、ターゲットモデルの潜在的な視覚的表現からの直接的なフィードバックを欠いている。これらの課題に対処するため、ReLAPSeは、概念回復を強化学習問題として再編成する政策ベースの敵対的枠組みである。 ReLAPSeはReinforcement Learning with Verifiable Rewards (RLVR) を用いてエージェントを訓練し、拡散モデルのノイズ予測損失をモデル固有かつ検証可能なフィードバック信号として活用する。このクローズドループ設計は、テキストによるプロンプト操作と遅延した視覚的残差を直接整列させ、エージェントは孤立したプロンプトを最適化するのではなく、転送可能な復元戦略を学習することができる。インスタンスごとの最適化からグローバルなポリシー学習への移行の先駆けとして、ReLAPSeは、複数の最先端の未学習メソッドにまたがる、きめ細かなアイデンティティとスタイルの効率的なほぼリアルタイム回復を実現し、未学習の拡散モデルの厳密なリピートのためのスケーラブルなツールを提供する。いくつかの実験的評価は、ヌードのような繊細な視覚概念を含む。コードはhttps://github.com/gmum/ReLaPSeで入手できる。

論文の概要: ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models

関連論文リスト