Fugu-MT 論文翻訳(概要): Rethinking Robust Adversarial Concept Erasure in Diffusion Models

論文の概要: Rethinking Robust Adversarial Concept Erasure in Diffusion Models

arxiv url: http://arxiv.org/abs/2510.27285v1
Date: Fri, 31 Oct 2025 08:53:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.041923
Title: Rethinking Robust Adversarial Concept Erasure in Diffusion Models
Title（参考訳）: 拡散モデルにおけるロバスト反転概念消去の再考
Authors: Qinghong Yin, Yu Tian, Yue Zhang,
Abstract要約: 概念消去は、拡散モデル(DM)における望ましくないコンテンツを選択的に学習し、センシティブなコンテンツ生成のリスクを低減することを目的としている。既存のほとんどの手法では、ターゲット概念を識別し、抑制するために敵の訓練を施しているため、機密出力の可能性が低下する。我々は,S-GRACEを導入し,S-GRACEは概念空間のセマンティックガイダンスを利用して,敵のサンプルを生成し,消去訓練を行う。
参考スコア（独自算出の注目度）: 11.734921828002895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Concept erasure aims to selectively unlearning undesirable content in diffusion models (DMs) to reduce the risk of sensitive content generation. As a novel paradigm in concept erasure, most existing methods employ adversarial training to identify and suppress target concepts, thus reducing the likelihood of sensitive outputs. However, these methods often neglect the specificity of adversarial training in DMs, resulting in only partial mitigation. In this work, we investigate and quantify this specificity from the perspective of concept space, i.e., can adversarial samples truly fit the target concept space? We observe that existing methods neglect the role of conceptual semantics when generating adversarial samples, resulting in ineffective fitting of concept spaces. This oversight leads to the following issues: 1) when there are few adversarial samples, they fail to comprehensively cover the object concept; 2) conversely, they will disrupt other target concept spaces. Motivated by the analysis of these findings, we introduce S-GRACE (Semantics-Guided Robust Adversarial Concept Erasure), which grace leveraging semantic guidance within the concept space to generate adversarial samples and perform erasure training. Experiments conducted with seven state-of-the-art methods and three adversarial prompt generation strategies across various DM unlearning scenarios demonstrate that S-GRACE significantly improves erasure performance 26%, better preserves non-target concepts, and reduces training time by 90%. Our code is available at https://github.com/Qhong-522/S-GRACE.
Abstract（参考訳）: 概念消去は、拡散モデル(DM)における望ましくないコンテンツを選択的に学習し、センシティブなコンテンツ生成のリスクを低減することを目的としている。概念消去の新たなパラダイムとして、既存のほとんどの手法では、ターゲット概念を特定して抑制するために敵の訓練を採用しており、センシティブな出力の可能性を低減している。しかし、これらの手法はDMの逆行訓練の特異性を無視することが多く、部分緩和しか生じない。本研究では、この特異性を概念空間の観点から検討し、定量化する。既存の手法は, 対数サンプルの生成において概念意味論の役割を欠いているため, 概念空間を効果的に適合させることができない。この監視は以下の問題に繋がる。 1) 反対サンプルが少ない場合,それらは,対象概念を包括的にカバーすることができない。 2) 逆に、他の対象概念空間を乱す。これらの結果から,S-GRACE(Semantics-Guided Robust Adversarial Concept Erasure)を導入し,概念空間内の意味的ガイダンスを活用して,敵のサンプルを生成し,消去訓練を行う。 S-GRACEは,7つの最先端手法と,様々なDMアンラーニングシナリオを対象とした3つの対向的プロンプト生成戦略により,消去性能を26%向上し,非ターゲット概念を保存し,トレーニング時間を90%短縮することを示した。私たちのコードはhttps://github.com/Qhong-522/S-GRACE.comから入手可能です。

論文の概要: Rethinking Robust Adversarial Concept Erasure in Diffusion Models

関連論文リスト