Fugu-MT 論文翻訳(概要): VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

論文の概要: VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

arxiv url: http://arxiv.org/abs/2603.13964v1
Date: Sat, 14 Mar 2026 14:21:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:35.513834
Title: VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction
Title（参考訳）: VID-AD:視覚誘発抽出による画像レベル論理異常検出用データセット
Authors: Hiroto Nakata, Yawen Zou, Shunsuke Sakai, Shun Maeda, Chunzhi Gu, Yijin Wei, Shangce Gao, Chao Zhang,
Abstract要約: VID-ADは、視覚誘発障害下での論理的異常検出のためのデータセットである。 10の製造シナリオと5つの捕獲条件で構成され、合計50の1級タスクと10,395のイメージで構成されている。正規画像から生成されたテキスト記述のみに依存する言語ベースの異常検出フレームワークを提案する。
参考スコア（独自算出の注目度）: 8.968670701930714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logical anomaly detection in industrial inspection remains challenging due to variations in visual appearance (e.g., background clutter, illumination shift, and blur), which often distract vision-centric detectors from identifying rule-level violations. However, existing benchmarks rarely provide controlled settings where logical states are fixed while such nuisance factors vary. To address this gap, we introduce VID-AD, a dataset for logical anomaly detection under vision-induced distraction. It comprises 10 manufacturing scenarios and five capture conditions, totaling 50 one-class tasks and 10,395 images. Each scenario is defined by two logical constraints selected from quantity, length, type, placement, and relation, with anomalies including both single-constraint and combined violations. We further propose a language-based anomaly detection framework that relies solely on text descriptions generated from normal images. Using contrastive learning with positive texts and contradiction-based negative texts synthesized from these descriptions, our method learns embeddings that capture logical attributes rather than low-level features. Extensive experiments demonstrate consistent improvements over baselines across the evaluated settings. The dataset is available at: https://github.com/nkthiroto/VID-AD.
Abstract（参考訳）: 産業検査における論理的異常検出は、視覚的外観の変化(背景のぼやけ、照明のシフト、ぼやけなど)により、しばしば視覚中心の検出器が規則レベルの違反を特定するのを妨げているため、依然として困難である。しかし、既存のベンチマークでは、そのようなニュアンス要因が異なる間に論理状態が固定されるような制御された設定はめったに提供されない。このギャップに対処するために,視覚誘発障害下での論理的異常検出のためのデータセットであるVID-ADを導入する。 10の製造シナリオと5つの捕獲条件で構成され、合計50の1級タスクと10,395のイメージで構成されている。各シナリオは、量、長さ、型、配置、関係から選択された2つの論理的制約によって定義される。さらに,正規画像から生成されたテキスト記述のみに依存する言語ベースの異常検出フレームワークを提案する。これらの記述から合成された正のテキストと矛盾に基づく負のテキストによる対照的な学習を用いて、本手法は低レベルの特徴ではなく論理的属性をキャプチャする埋め込みを学習する。大規模な実験では、評価された設定のベースラインよりも一貫した改善が示されている。データセットは、https://github.com/nkthiroto/VID-ADで利用可能だ。

論文の概要: VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

関連論文リスト