Fugu-MT 論文翻訳(概要): Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

論文の概要: Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

arxiv url: http://arxiv.org/abs/2605.05850v1
Date: Thu, 07 May 2026 08:24:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.624667
Title: Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection
Title（参考訳）: Align3D-AD: ゼロショット3D異常検出のためのクロスモーダル特徴アライメントとデュアルプロンプト学習
Authors: Letian Bai, Xuanming Cao, Juan Du, Chengyu Tao,
Abstract要約: ゼロショット3D異常検出は、ターゲットカテゴリからのトレーニングデータにアクセスすることなく、異常を識別することを目的としている。既存の手法は主に幾何学的手がかりを主に捉える多視点表現に3D観測を投影することに依存している。本稿では,補助カテゴリからのRGBモダリティをクロスモーダルガイダンスとして活用する2段階統合フレームワークAlign3D-ADを提案する。
参考スコア（独自算出の注目度）: 2.08058961865456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather than realistic visual semantics and process them with vision encoders pretrained on RGB data, leading to a significant domain gap between the encoder and the projected representations. To address this issue, we propose Align3D-AD, a unified two-stage framework that leverages the RGB modality from auxiliary categories as cross-modal guidance for zero-shot 3D anomaly detection. First, we introduce a cross-modal feature alignment paradigm that maps rendering features into the RGB semantic space. Unlike prior works that implicitly rely on pretrained encoders, our method enables direct semantic transfer from RGB observations. A semantic consistency reweighting strategy is further introduced to refine feature alignment by reweighting local regions according to holistic semantic consistency. Second, we propose a modality-aware prompt learning framework with dual-prompt contrastive alignment. By assigning independent prompts to RGB-aligned and rendering features, our method captures complementary semantics across modalities, while the contrastive alignment further enhances prompt representations to improve discriminability. Extensive experiments on MVTec3D-AD, Eyecandies, and Real3D-AD demonstrate that Align3D-AD consistently outperforms existing zero-shot methods under both one-vs-rest and cross-dataset settings, highlighting its generalization capability and robustness. Code and the dataset will be made available once our paper is accepted.
Abstract（参考訳）: ゼロショット3D異常検出は、ターゲットカテゴリからのトレーニングデータにアクセスすることなく、異常を識別することを目的としている。しかし、既存の手法は主に、現実的な視覚的意味論ではなく幾何学的手がかりを主に捉え、RGBデータに基づいて事前訓練された視覚エンコーダで処理する多視点表現への3D観察の投影に依存しており、エンコーダと投影された表現の間に大きな領域ギャップが生じる。この問題に対処するために,補助カテゴリからのRGBモダリティをゼロショット3D異常検出のためのクロスモーダルガイダンスとして活用する,統合された2段階フレームワークであるAlign3D-ADを提案する。まず、レンダリング機能をRGBセマンティック空間にマッピングするクロスモーダルな特徴アライメントパラダイムを導入する。事前学習したエンコーダに暗黙的に依存する先行研究とは異なり、本手法はRGB観測から直接意味伝達を可能にする。セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティクスに従って局所領域を再重み付けすることで特徴アライメントを洗練するためにセマンティック・セマンティクス・アライメント・ストラテジーを導入する。第二に、二段階のコントラストアライメントを持つモーダルアウェア・プロンプト学習フレームワークを提案する。提案手法は,RGB とレンダリング機能に独立的なプロンプトを割り当てることで,モダリティ間の相補的セマンティクスをキャプチャし,コントラクティブアライメントはプロンプト表現をさらに強化し,識別性を向上させる。 MVTec3D-AD、Eyecandies、Real3D-ADに関する大規模な実験では、Align3D-ADは1vs-restとクロスデータセット設定の両方で既存のゼロショットメソッドよりも一貫して優れており、その一般化能力と堅牢性を強調している。私たちの論文が受け入れられたら、コードとデータセットが利用可能になります。

論文の概要: Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

関連論文リスト