Fugu-MT 論文翻訳(概要): ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection

論文の概要: ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection

arxiv url: http://arxiv.org/abs/2604.17949v1
Date: Mon, 20 Apr 2026 08:30:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.765921
Title: ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection
Title（参考訳）: ZSG-IAD:ゼロショット接地産業異常検出のためのマルチモーダルフレームワーク
Authors: Qiuhui Chen, Jiaxiang Song, Shuai Tan, Weimin Zhong,
Abstract要約: ZSG-IADはゼロショットの産業異常検出のためのフレームワークである。構造化された異常レポートとピクセルレベルの異常マスクを生成する。信頼性の高い産業異常検知システムに関する今後の研究を支援するためのコードとアノテーションをリリースする。
参考スコア（独自算出の注目度）: 14.275030421757867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning-based industrial anomaly detectors often behave as black boxes, making it hard to justify decisions with physically meaningful defect evidence. We propose ZSG-IAD, a multimodal vision-language framework for zero-shot grounded industrial anomaly detection. Given RGB images, sensor images, and 3D point clouds, ZSG-IAD generates structured anomaly reports and pixel-level anomaly masks. ZSG-IAD introduces a language-guided two-hop grounding module: (1) anomaly-related sentences select evidence-like latent slots distilled from multimodal features, yielding coarse spatial support; (2) selected slots modulate feature maps via channel-spatial gating and a lightweight decoder to produce fine-grained masks. To improve reliability, we further apply Executable-Rule GRPO with verifiable rewards to promote structured outputs, anomaly-region consistency, and reasoning-conclusion coherence. Experiments across multiple industrial anomaly benchmarks show strong zero-shot performance and more transparent, physically grounded explanations than prior methods. We will release code and annotations to support future research on trustworthy industrial anomaly detection systems.
Abstract（参考訳）: 深層学習に基づく産業異常検知器は、しばしばブラックボックスとして振る舞うため、物理的に意味のある欠陥証拠で決定を正当化することは困難である。ゼロショット接地産業異常検出のための多モード視覚言語フレームワークZSG-IADを提案する。 RGB画像、センサー画像、および3Dポイントクラウドが与えられた場合、ZSG-IADは構造化された異常レポートとピクセルレベルの異常マスクを生成する。 ZSG-IADは,(1)多モーダルな特徴から抽出されたエビデンスのような潜伏スロットを選択し,粗い空間的支持を与える,(2)チャネル空間的ゲーティングによる特徴マップを変調する,および,より軽量なデコーダにより,きめ細かなマスクを生成させる,言語誘導二脚接地モジュールを導入する。信頼性を向上させるために,提案手法を検証可能な報奨付き実行可能ルールGRPOを適用し,構造的出力,異常領域の整合性,推論と結論の整合性を向上する。複数の産業異常ベンチマークによる実験では、従来の方法よりも強いゼロショット性能と、より透明で物理的に基礎的な説明が示されている。信頼性の高い産業異常検知システムに関する今後の研究を支援するためのコードとアノテーションをリリースする。

論文の概要: ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection

関連論文リスト