Fugu-MT 論文翻訳(概要): TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

論文の概要: TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

arxiv url: http://arxiv.org/abs/2602.03594v1
Date: Tue, 03 Feb 2026 14:48:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:15.519832
Title: TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection
Title（参考訳）: TIPS over Tricks: 効率的なゼロショット異常検出のためのシンプルなプロンプト
Authors: Alireza Salehi, Ehsan Karami, Sepehr Noey, Sahand Noey, Makoto Yamada, Reshad Hosseini, Mohammad Sabokrou,
Abstract要約: 異常検出は、安全クリティカルな設定における期待された行動からの離脱を特定する。我々のパイプラインは、7つの産業データセットで画像レベルのパフォーマンスを1.1-3.9%改善し、ピクセルレベルのパフォーマンスを1.5-6.9%向上させた。
参考スコア（独自算出の注目度）: 19.691698434869657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Anomaly detection identifies departures from expected behavior in safety-critical settings. When target-domain normal data are unavailable, zero-shot anomaly detection (ZSAD) leverages vision-language models (VLMs). However, CLIP's coarse image-text alignment limits both localization and detection due to (i) spatial misalignment and (ii) weak sensitivity to fine-grained anomalies; prior work compensates with complex auxiliary modules yet largely overlooks the choice of backbone. We revisit the backbone and use TIPS-a VLM trained with spatially aware objectives. While TIPS alleviates CLIP's issues, it exposes a distributional gap between global and local features. We address this with decoupled prompts-fixed for image-level detection and learnable for pixel-level localization-and by injecting local evidence into the global score. Without CLIP-specific tricks, our TIPS-based pipeline improves image-level performance by 1.1-3.9% and pixel-level by 1.5-6.9% across seven industrial datasets, delivering strong generalization with a lean architecture. Code is available at github.com/AlirezaSalehy/Tipsomaly.
Abstract（参考訳）: 異常検出は、安全クリティカルな設定における期待された行動からの離脱を特定する。ターゲットドメインの正規データが利用できない場合、ゼロショット異常検出(ZSAD)は視覚言語モデル(VLM)を利用する。しかし、CLIPの粗い画像テキストアライメントは、局所化と検出の両方を制限している。 (i)空間的不整合、及び (II) 微粒な異常に対する弱い感度; 先行処理は複雑な補助モジュールを補うが、バックボーンの選択をほとんど見落としている。我々は,背骨を再考し,空間的に認識された目的で訓練されたTIPS-a VLMを使用する。 TIPSはCLIPの問題を軽減するが、グローバル機能とローカル機能の間の分散的なギャップを露呈する。画像レベルの検出のために分離されたプロンプトを固定し、画素レベルのローカライゼーションを学習し、局所的な証拠をグローバルスコアに注入することでこの問題に対処する。 CLIP固有のトリックがなければ、当社のTIPSベースのパイプラインは、画像レベルのパフォーマンスを1.1-3.9%改善し、7つの産業データセットで1.5-6.9%向上し、リーンアーキテクチャによる強力な一般化を実現しています。コードはgithub.com/AlirezaSalehy/Tipsomalyで入手できる。

論文の概要: TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection

関連論文リスト