Fugu-MT 論文翻訳(概要): Instance-level Visual Active Tracking with Occlusion-Aware Planning

論文の概要: Instance-level Visual Active Tracking with Occlusion-Aware Planning

arxiv url: http://arxiv.org/abs/2604.21453v1
Date: Thu, 23 Apr 2026 09:11:50 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.399916
Title: Instance-level Visual Active Tracking with Occlusion-Aware Planning
Title（参考訳）: Occlusion-Aware Planning を用いたインスタンスレベルの視覚能動追跡
Authors: Haowei Sun, Kai Zhou, Hao Gao, Shiteng Zhang, Jinwu Hu, Xutao Wen, Qixiang Ye, Mingkui Tan,
Abstract要約: Visual Active Tracking (VAT)は、カメラを3D空間でターゲットに追従することを目的としている。 VATは、視覚的に類似したイントラクタからの混乱と、閉塞下での深刻な障害という、現実世界のデプロイメントにおいて2つの重要なボトルネックに直面している。 3つの相補的なモジュールを持つ統一パイプラインであるOA-VATを提案する。
参考スコア（独自算出の注目度）: 61.982298426203165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual Active Tracking (VAT) aims to control cameras to follow a target in 3D space, which is critical for applications like drone navigation and security surveillance. However, it faces two key bottlenecks in real-world deployment: confusion from visually similar distractors caused by insufficient instance-level discrimination and severe failure under occlusions due to the absence of active planning. To address these, we propose OA-VAT, a unified pipeline with three complementary modules. First, a training-free Instance-Aware Offline Prototype Initialization aggregates multi-view augmented features via DINOv3 to construct discriminative instance prototypes, mitigating distractor confusion. Second, an Online Prototype Enhancement Tracker enhances prototypes online and integrates a confidence-aware Kalman filter for stable tracking under appearance and motion changes. Third, an Occlusion-Aware Trajectory Planner, trained on our new Planning-20k dataset, uses conditional diffusion to generate obstacle-avoiding paths for occlusion recovery. Experiments demonstrate OA-VAT achieves 0.93 average SR on UnrealCV (+2.2% vs. SOTA TrackVLA), 90.8% average CAR on real-world datasets (+12.1% vs. SOTA GC-VAT), and 81.6% TSR on a DJI Tello drone. Running at 35 FPS on an RTX 3090, it delivers robust, real-time performance for practical deployment.
Abstract（参考訳）: Visual Active Tracking(VAT)は、ドローンナビゲーションやセキュリティ監視といったアプリケーションにとって重要な3D空間におけるターゲットを追従するためのカメラの制御を目的としている。しかし、実際のデプロイメントでは2つの大きなボトルネックに直面している。インスタンスレベルの識別が不十分なことによる視覚的に類似した障害からの混乱と、アクティブプランニングが欠如していることによる排他的障害である。そこで我々は,3つの相補的なモジュールを持つ統一パイプラインであるOA-VATを提案する。まず、トレーニング不要のインスタンス対応のオフラインプロトタイプ初期化は、DINOv3を介してマルチビューの拡張機能を集約し、差別的なインスタンスプロトタイプを構築し、混乱を緩和する。第二に、オンラインプロトタイプエンハンスメントトラッカーは、プロトタイプをオンラインに拡張し、信頼を意識したKalmanフィルタを統合して、外見や動きの変化を安定的に追跡する。第三に、私たちの新しいプランニング20kデータセットに基づいて訓練されたOcclusion-Aware Trajectory Plannerは、条件拡散を使って閉塞回復のための障害物回避経路を生成する。 OA-VATはUnrealCV(+2.2%対SOTA TrackVLA)で0.93SR、現実世界のデータセットで90.8%(+12.1%対SOTA GC-VAT)、DJI Telloドローンで81.6%TSRを達成した。 RTX 3090上で35 FPSで動作すると、実用的なデプロイメントのために堅牢でリアルタイムなパフォーマンスを提供する。

論文の概要: Instance-level Visual Active Tracking with Occlusion-Aware Planning

関連論文リスト