Fugu-MT 論文翻訳(概要): IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

論文の概要: IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

arxiv url: http://arxiv.org/abs/2508.10681v1
Date: Thu, 14 Aug 2025 14:24:47 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 22:24:48.355067
Title: IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning
Title（参考訳）: IADGPT:インテクスト学習によるファウショット産業異常検出, 局所化, 推論のための統一LVLM
Authors: Mengyang Zhao, Teng Fu, Haiyang Yu, Ke Niu, Bin Li,
Abstract要約: Few-Shot Industrial Anomaly Detection (FS-IAD) は産業品質検査の自動化に重要な応用例である。我々は,FS-IADを人間的な方法で実行するための統合フレームワーク IADGPT を提案する。本稿では,400種類の産業製品カテゴリにまたがる100K画像からなる新しいデータセットについて述べる。
参考スコア（独自算出の注目度）: 18.078896149087576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-Shot Industrial Anomaly Detection (FS-IAD) has important applications in automating industrial quality inspection. Recently, some FS-IAD methods based on Large Vision-Language Models (LVLMs) have been proposed with some achievements through prompt learning or fine-tuning. However, existing LVLMs focus on general tasks but lack basic industrial knowledge and reasoning capabilities related to FS-IAD, making these methods far from specialized human quality inspectors. To address these challenges, we propose a unified framework, IADGPT, designed to perform FS-IAD in a human-like manner, while also handling associated localization and reasoning tasks, even for diverse and novel industrial products. To this end, we introduce a three-stage progressive training strategy inspired by humans. Specifically, the first two stages gradually guide IADGPT in acquiring fundamental industrial knowledge and discrepancy awareness. In the third stage, we design an in-context learning-based training paradigm, enabling IADGPT to leverage a few-shot image as the exemplars for improved generalization to novel products. In addition, we design a strategy that enables IADGPT to output image-level and pixel-level anomaly scores using the logits output and the attention map, respectively, in conjunction with the language output to accomplish anomaly reasoning. To support our training, we present a new dataset comprising 100K images across 400 diverse industrial product categories with extensive attribute-level textual annotations. Experiments indicate IADGPT achieves considerable performance gains in anomaly detection and demonstrates competitiveness in anomaly localization and reasoning. We will release our dataset in camera-ready.
Abstract（参考訳）: Few-Shot Industrial Anomaly Detection (FS-IAD) は産業品質検査の自動化に重要な応用例である。近年,LVLM(Large Vision-Language Models)に基づくFS-IAD手法が提案されている。しかし、既存のLVLMは一般的なタスクに重点を置いているが、FS-IADに関する基本的な産業知識や推論能力は欠如しており、これらの手法は専門の人間品質検査者からは遠ざかっている。これらの課題に対処するため、我々はFS-IADを人間的な方法で実行するための統一的なフレームワークIADGPTを提案し、同時に、多種多様な新規産業製品に対しても、関連するローカライゼーションと推論タスクを処理した。この目的のために,人間に触発された3段階のプログレッシブトレーニング戦略を導入する。特に、第1段階の2段階は、IADGPTの基本的な産業知識と差別意識の獲得を徐々に導く。第3段階では、テキスト内学習に基づく訓練パラダイムを設計し、IADGPTは、新規製品への一般化を改善するために、いくつかのショット画像を活用することができる。さらに,IADGPTがロジット出力とアテンションマップを用いて画像レベルの異常スコアと画素レベルの異常スコアを出力する手法を,言語出力と合わせて設計し,異常推論を実現する。トレーニングを支援するために,400種類の産業製品カテゴリにまたがる100Kイメージからなる新しいデータセットを提案する。 IADGPTは異常検出においてかなりの性能向上を示し、異常局所化と推論における競合性を示す。データセットをカメラ対応でリリースします。

論文の概要: IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning

関連論文リスト