Fugu-MT 論文翻訳(概要): Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference

論文の概要: Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference

arxiv url: http://arxiv.org/abs/2604.23724v1
Date: Sun, 26 Apr 2026 14:09:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.522975
Title: Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference
Title（参考訳）: Zoom In, Reason Out:ベイズ推論による集束VLM推論による高速道路監視ビデオにおける高能率遠距離異常検出
Authors: Xiaowei Mao, Bowen Sui, Weijie Zhang, Yawen Yang, Shengnan Guo, Shilong Zhao, Jiaqi Lin, Tingrui Wu, Youfang Lin, Huaiyu Wa,
Abstract要約: 高速道路ビデオ異常検出は安全管理に不可欠である。本稿では,様々な高速道路環境における一般化の低さを克服するオンラインベイズ推論モジュールを提案する。本研究では,様々な高速道路条件をまたいだ一般化を図りながら,リアルタイムの効率化と説明性を実現していることを示す。
参考スコア（独自算出の注目度）: 25.036113180047845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Expressway video anomaly detection is essential for safety management. However, identifying anomalies across diverse scenes remains challenging, particularly for far-field targets exhibiting subtle abnormal vehicle motions. While Vision-Language Models (VLMs) demonstrate strong semantic reasoning capabilities, processing global frames causes attention dilution for these far-field objects and incurs prohibitive computational costs. To address these issues, we propose VIBES, an asynchronous collaborative framework utilizing VLMs guided by Bayesian inference. Specifically, to overcome poor generalization across varying expressway environments, we introduce an online Bayesian inference module. This module continuously evaluates vehicle trajectories to dynamically update the probabilistic boundaries of normal driving behaviors, serving as an asynchronous trigger to precisely localize anomalies in space and time. Instead of processing the continuous video stream, the VLM processes only the localized visual regions indicated by the trigger. This targeted visual input prevents attention dilution and enables accurate semantic reasoning. Extensive evaluations demonstrate that VIBES improves detection accuracy for far-field anomalies and reduces computational overhead, achieving high real-time efficiency and explainability while demonstrating generalization across diverse expressway conditions.
Abstract（参考訳）: 高速道路ビデオ異常検出は安全管理に不可欠である。しかし、特に微妙な異常な車両の動きを示す遠距離目標に対して、様々な場面で異常を特定することは依然として困難である。 VLM(Vision-Language Models)は強力なセマンティック推論能力を示すが、グローバルフレームの処理は、これらの遠距離フィールドオブジェクトに対する注意の希釈を招き、禁忌の計算コストを発生させる。これらの問題に対処するために,ベイジアン推論で導かれるVLMを利用した非同期協調フレームワークVIBESを提案する。具体的には,様々な高速道路環境における一般化の難しさを克服するために,オンラインベイズ推論モジュールを導入する。このモジュールは車軌道を継続的に評価し、通常の運転行動の確率的境界を動的に更新する。連続したビデオストリームを処理する代わりに、VLMはトリガーによって示される局所的な視覚領域のみを処理する。この目的の視覚入力は注意の希釈を防止し、正確な意味推論を可能にする。広汎な評価により、VIBESは遠距離フィールド異常の検出精度を向上し、計算オーバーヘッドを低減し、様々な高速道路条件をまたいだ一般化を実証しつつ、高いリアルタイム効率と説明可能性を実現する。

関連論文リスト

DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning [4.57409624068048]
本稿では、ドライバの認識パターンをエミュレートし、インテリジェントな車両の視覚的注意予測を促進するためのDiffAttnを提案する。我々のフレームワークは、インテリジェントな車両における車内人間と機械の相互作用、リスク知覚、運転者の状態測定を改善する可能性を秘めている。
論文参考訳（メタデータ） (2026-03-30T10:24:20Z)
Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection [52.5174167737992]
ビデオ異常検出(VAD)は、ビデオ内の異常事象を特定することを目的としている。本稿では,MLLMに基づくVADを受動的に読み上げから内部表現を積極的に操り,修正するSteerVADを提案する。本手法は、トレーニングデータの1%しか必要としないチューニングフリーアプローチにおける最先端性能を実現する。
論文参考訳（メタデータ） (2026-02-27T13:48:50Z)
A Unified Reasoning Framework for Holistic Zero-Shot Video Anomaly Analysis [64.42659342276117]
ビデオの異常な研究のほとんどは、フレームワイド検出で停止し、なぜイベントが異常なのかについての洞察はほとんど得られない。近年の動画の局所化と映像の異常理解手法は、説明可能性を改善するが、データに依存し、タスク固有のままである。本稿では,時間的検出,空間的局所化,テキスト的説明のギャップを埋める統一的推論フレームワークを提案する。
論文参考訳（メタデータ） (2025-11-02T14:49:08Z)
SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model [52.47816604709358]
ビデオ異常検出(VAD)は、ビデオ内の予期せぬ事象を識別することを目的としており、安全クリティカルドメインに広く応用されている。視覚言語モデル(VLM)は強力なマルチモーダル推論能力を示し、異常検出の新しい機会を提供している。 SlowFastVADは高速異常検出器と低速異常検出器を統合したハイブリッドフレームワークである。
論文参考訳（メタデータ） (2025-04-14T15:30:03Z)
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
逆条件深さ推定のためのMulti-Modality Driven LoRA(MMD-LoRA)を提案する。 Prompt Driven Domain Alignment (PDDA) と Visual-Text Consistent Contrastive Learning (VTCCL) の2つのコアコンポーネントで構成されている。 nuScenesとOxford RobotCarデータセットの最先端のパフォーマンスを実現する。
論文参考訳（メタデータ） (2024-12-28T14:23:58Z)
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
本稿では、事前学習された視覚言語モデル(VLM)に基づく、弱教師付きビデオ異常検出および局所化のための時間的プロンプト埋め込み(WSVADL)を学習する新しい手法を提案する。提案手法は,WSVADLタスクの3つの公開ベンチマークにおける最先端性能を実現する。
論文参考訳（メタデータ） (2024-08-12T03:31:29Z)
MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection [39.923871347007875]
そこで本稿では,空間時間情報を統合して高精度な異常検出を行う新しい視点・焦点ネットワークを提案する。異常の程度を表すために特徴量を使用する既存のアプローチは、通常、シーンのバリエーションの影響を無視する。本稿では,異常検出のための特徴量の識別性を高めるため,特徴増幅機構とマグニチュードコントラスト損失を提案する。
論文参考訳（メタデータ） (2022-11-28T07:10:36Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。