Fugu-MT 論文翻訳(概要): Text-Guided Multimodal Unified Industrial Anomaly Detection

論文の概要: Text-Guided Multimodal Unified Industrial Anomaly Detection

arxiv url: http://arxiv.org/abs/2604.22899v1
Date: Fri, 24 Apr 2026 13:21:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.041977
Title: Text-Guided Multimodal Unified Industrial Anomaly Detection
Title（参考訳）: テキストガイドによる多モード統一産業異常検出
Authors: Zewen Li, Shuo Ye, Zitong Yu, Weicheng Xie, Linlin Shen,
Abstract要約: そこで本研究では,テキストセマンティクスによって導かれる多モーダル産業異常検出フレームワークを提案する。フレームワークは、Geometry-Aware Cross-Modal MapperとObject-Conditioned Textual Feature Adaptorの2つのコアモジュールで構成されている。本手法は,教師なし環境下での分類とローカライゼーションにおける最先端性能を実現する。
参考スコア（独自算出の注目度）: 71.95719669933312
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Industrial anomaly detection based on RGB-3D multimodal data has emerged as a mainstream paradigm for intelligent quality inspection. However, existing unsupervised methods suffer from two critical limitations: ambiguous cross-modal alignment caused by the lack of high-level semantic guidance and insufficient geometric modeling for RGB-to-3D feature mapping. To address these issues, we propose a unified multimodal industrial anomaly detection framework guided by text semantics. The framework consists of two core modules: a Geometry-Aware Cross-Modal Mapper to preserve geometric structure during modality conversion, and an Object-Conditioned Textual Feature Adaptor to align multimodal features with semantic priors. Furthermore, we establish a unified learning paradigm for multimodal industrial anomaly detection, which breaks the one-model-one-class constraint and enables accurate anomaly detection across diverse classes using a single model. Extensive experiments on the MVTec 3D-AD and Eyecandies datasets demonstrate that our method achieves state-of-the-art performance in classification and localization under unsupervised settings.
Abstract（参考訳）: RGB-3Dマルチモーダルデータに基づく産業異常検出は知的品質検査の主流パラダイムとして浮上している。しかし、既存の教師なし手法には、高レベルな意味指導の欠如とRGB-to-3D特徴マッピングの幾何的モデリングの欠如による曖昧なクロスモーダルアライメントの2つの限界がある。これらの問題に対処するため,テキストセマンティクスで導かれる多モード産業異常検出フレームワークを提案する。このフレームワークは、2つのコアモジュールで構成されている: 幾何対応のクロスモーダルマッパーは、モダリティ変換中の幾何学的構造を保存する。さらに,マルチモーダル産業異常検出のための統一学習パラダイムを構築し,一モデル一級制約を破り,単一モデルを用いた多様なクラス間で正確な異常検出を可能にする。 MVTec 3D-ADデータセットとEyecandiesデータセットの大規模な実験により,教師なし環境下での分類とローカライゼーションにおける最先端性能が実証された。

論文の概要: Text-Guided Multimodal Unified Industrial Anomaly Detection

関連論文リスト