Fugu-MT 論文翻訳(概要): Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model

論文の概要: Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model

arxiv url: http://arxiv.org/abs/2508.11550v1
Date: Fri, 15 Aug 2025 15:52:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-18 14:51:24.1349
Title: Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model
Title（参考訳）: 拡散モデルにおけるデュアルアテンション強化による訓練不要な異常生成
Authors: Zuo Zuo, Jiahao Dong, Yanyun Qu, Zongze Wu,
Abstract要約: 異常発生による不十分な異常データに対処する研究が増えている。本稿では,AAGと呼ばれるトレーニング不要な異常生成フレームワークを提案する。 AAGは、有効な異常画像生成のための安定拡散の強い生成能力に基づいている。
参考スコア（独自算出の注目度）: 21.461351819711936
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial anomaly detection (AD) plays a significant role in manufacturing where a long-standing challenge is data scarcity. A growing body of works have emerged to address insufficient anomaly data via anomaly generation. However, these anomaly generation methods suffer from lack of fidelity or need to be trained with extra data. To this end, we propose a training-free anomaly generation framework dubbed AAG, which is based on Stable Diffusion (SD)'s strong generation ability for effective anomaly image generation. Given a normal image, mask and a simple text prompt, AAG can generate realistic and natural anomalies in the specific regions and simultaneously keep contents in other regions unchanged. In particular, we propose Cross-Attention Enhancement (CAE) to re-engineer the cross-attention mechanism within Stable Diffusion based on the given mask. CAE increases the similarity between visual tokens in specific regions and text embeddings, which guides these generated visual tokens in accordance with the text description. Besides, generated anomalies need to be more natural and plausible with object in given image. We propose Self-Attention Enhancement (SAE) which improves similarity between each normal visual token and anomaly visual tokens. SAE ensures that generated anomalies are coherent with original pattern. Extensive experiments on MVTec AD and VisA datasets demonstrate effectiveness of AAG in anomaly generation and its utility. Furthermore, anomaly images generated by AAG can bolster performance of various downstream anomaly inspection tasks.
Abstract（参考訳）: 産業異常検出(AD)は、長年の課題がデータ不足である製造において重要な役割を担っている。異常発生による不十分な異常データに対処する研究が増えている。しかし、これらの異常発生法は忠実さの欠如に悩まされるか、余分なデータで訓練する必要がある。そこで本研究では,SD(Stable Diffusion)の強力な画像生成能力に基づく,AAGと呼ばれるトレーニング不要な画像生成フレームワークを提案する。通常の画像、マスク、簡単なテキストプロンプトが与えられた場合、AAGは特定の領域における現実的で自然な異常を発生させ、同時に他の領域のコンテンツを保持することができる。特に,安定拡散におけるクロスアテンション機構をマスクに基づいて再設計するためのクロスアテンションエンハンスメント(CAE)を提案する。 CAEは、特定の領域における視覚トークンとテキスト埋め込みの類似性を高め、テキスト記述に従って生成された視覚トークンをガイドする。さらに、生成された異常は、より自然で、与えられた画像のオブジェクトに対して可視である必要がある。本稿では,通常の視覚トークンと異常な視覚トークンとの類似性を改善する自己注意強調(SAE)を提案する。 SAEは生成された異常が元のパターンと一致していることを保証する。 MVTec ADとVisAデータセットの大規模な実験は、異常発生におけるAAGの有効性とその有用性を示している。さらに、AAGによって生成された異常画像は、様々な下流異常検査タスクのパフォーマンスを高めることができる。

論文の概要: Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model

関連論文リスト