Fugu-MT 論文翻訳(概要): What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

論文の概要: What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

arxiv url: http://arxiv.org/abs/2605.10180v1
Date: Mon, 11 May 2026 08:31:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.646765
Title: What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers
Title（参考訳）: 拡散変圧器におけるリスク内容の検出と抑制
Authors: Chenyu Zhang, Lanjun Wang, Yueyang Cheng, Ruidong Chen, Wenhui Li, An-an Liu,
Abstract要約: AHV-D&Sは、DiTにおける画像生成のためのトレーニング不要な推論時セーフガードである。 AHV-D&Sは、視覚的品質を維持しつつ、性的、著作権のあるスタイル、および様々な有害なコンテンツを効果的に抑制することを示す。
参考スコア（独自算出の注目度）: 41.55824439218019
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise of text-to-image (T2I) models has increasingly raised concerns regarding the generation of risky content, such as sexual, violent, and copyright-protected images, highlighting the need for effective safeguards within the models themselves. Although existing methods have been proposed to eliminate risky concepts from T2I models, they are primarily developed for earlier U-Net architectures, leaving the state-of-the-art Diffusion-Transformer-based T2I models inadequately protected. This gap stems from a fundamental architectural shift: Diffusion Transformers (DiTs) entangle semantic injection and visual synthesis via joint attention, which makes it difficult to isolate and erase risky content within the generation. To bridge this gap, we investigate how semantic concepts are represented in DiTs and discover that attention heads exhibit concept-specific sensitivity. This property enables both the detection and suppression of risky content. Building on this discovery, we propose AHV-D\&S, a training-free inference-time safeguard for image generation in DiTs. Specifically, AHV-D\&S quantifies each textual token's sensitivity across all attention heads as an Attention Head Vector (AHV), which serves as a discriminative signature for detecting risky generation tendencies. In the inference stage, we propose a momentum-based strategy to dynamically track token-wise AHVs across denoising steps, and a sensitivity-guided adaptive suppression strategy that suppresses the attention weights of identified risky tokens based on head-specific risk scores. Extensive experiments demonstrate that AHV-D\&S effectively suppresses sexual, copyrighted-style, and various harmful content while preserving visual quality, and further exhibits strong robustness against adversarial prompts and transferability across different DiT-based T2I models.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)モデルの台頭は、性的、暴力的、著作権に保護された画像などの危険コンテンツの発生に対する懸念を高め、モデル自体の効果的な保護の必要性を強調している。 T2Iモデルからリスクの高い概念を排除するために既存の手法が提案されているが、それらは主に初期のU-Netアーキテクチャ向けに開発され、最先端のDiffusion-TransformerベースのT2Iモデルは適切に保護されていない。拡散変換器(DiT) 絡み合ったセマンティックインジェクションと共同注意による視覚合成。このギャップを埋めるために, 意味概念がDiTでどのように表現されるかを調べ, 注意ヘッドが概念固有の感度を示すことを明らかにする。この性質により、リスクのあるコンテンツの検出と抑制が可能である。この発見に基づいて,DiTにおける画像生成のためのトレーニング不要な推論時セーフガードであるAHV-D\&Sを提案する。具体的には、AHV-D\&Sは、各テキストトークンの感度をアテンションヘッドベクトル(AHV)として、すべてのアテンションヘッドにわたって定量化し、リスク発生傾向を検出するための識別的シグネチャとして機能する。推論段階では,トークン単位のAHVを動的に追跡するモーメントベースの戦略と,頭部固有のリスクスコアに基づいて識別されたリスクトークンの注意重みを抑える感度誘導適応型抑制戦略を提案する。広汎な実験により、AHV-D\&Sは視覚的品質を維持しながら、性的、著作権のあるスタイル、および様々な有害なコンテンツを効果的に抑制し、さらに異なるDiTベースのT2Iモデル間の敵対的プロンプトや転送可能性に対して強い堅牢性を示すことが示されている。

論文の概要: What Concepts Lie Within? Detecting and Suppressing Risky Content in Diffusion Transformers

関連論文リスト