Fugu-MT 論文翻訳(概要): COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm

論文の概要: COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm

arxiv url: http://arxiv.org/abs/2603.24016v1
Date: Wed, 25 Mar 2026 07:20:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.180269
Title: COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm
Title（参考訳）: COVTrack++: 相乗的パラダイムによる連続ビデオからのオープン語彙マルチオブジェクト追跡学習
Authors: Zekun Qian, Wei Feng, Ruize Han, Junhui Hou,
Abstract要約: C-TAOはOpen-Vocabulary Multi-Object Tracking (OVMOT)のための最初の連続アノテーション付きトレーニングセットであるフレームワークボトルネックに対するCOVTrack++は,3つのモジュールによる検出とアソシエーションの双方向相互機構を実現するための相乗的フレームワークである。 TAOの実験では、新しいTAAは検証とテストセットで35.4%、30.5%に達し、新しいAssocAは4.8%、新しいLocAは5.8%向上した。
参考スコア（独自算出の注目度）: 59.26203051651017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Object Tracking (MOT) has traditionally focused on a few specific categories, restricting its applicability to real-world scenarios involving diverse objects. Open-Vocabulary Multi-Object Tracking (OVMOT) addresses this by enabling tracking of arbitrary categories, including novel objects unseen during training. However, current progress is constrained by two challenges: the lack of continuously annotated video data for training, and the lack of a customized OVMOT framework to synergistically handle detection and association. We address the data bottleneck by constructing C-TAO, the first continuously annotated training set for OVMOT, which increases annotation density by 26x over the original TAO and captures smooth motion dynamics and intermediate object states. For the framework bottleneck, we propose COVTrack++, a synergistic framework that achieves a bidirectional reciprocal mechanism between detection and association through three modules: (1) Multi-Cue Adaptive Fusion (MCF) dynamically balances appearance, motion, and semantic cues for association feature learning; (2) Multi-Granularity Hierarchical Aggregation (MGA) exploits hierarchical spatial relationships in dense detections, where visible child nodes (e.g., object parts) assist occluded parent objects (e.g., whole body) for association feature enhancement; (3) Temporal Confidence Propagation (TCP) recovers flickering detections through high-confidence tracked objects boosting low-confidence candidates across frames, stabilizing trajectories. Extensive experiments on TAO demonstrate state-of-the-art performance, with novel TETA reaching 35.4% and 30.5% on validation and test sets, improving novel AssocA by 4.8% and novel LocA by 5.8% over previous methods, and show strong zero-shot generalization on BDD100K. The code and dataset will be publicly available.
Abstract（参考訳）: 従来、MOT(Multi-Object Tracking)はいくつかの特定のカテゴリに重点を置いてきた。 Open-Vocabulary Multi-Object Tracking (OVMOT)は、トレーニング中に見えない新しいオブジェクトを含む任意のカテゴリをトラッキング可能にすることで、この問題に対処する。しかし、現在の進歩は、トレーニングのための継続的なアノテーション付きビデオデータの欠如と、検出と関連を相乗的に扱うためのカスタマイズされたOVMOTフレームワークの欠如という2つの課題によって制限されている。我々は,OVMOTの最初の連続アノテーション付きトレーニングセットであるC-TAOを構築することで,データボトルネックに対処する。フレームワークのボトルネックとして,(1)マルチキュー・アダプティブ・フュージョン(MCF, Multi-Cue Adaptive Fusion, マルチキュー・アダプティブ・フュージョン)は,特徴学習のための外観,動作,意味的キューを動的にバランスさせる,(2)マルチグラニュラリティ・階層的アグリゲーション(MGA)は,高密度検出における階層的空間的関係を生かし,可視な子ノード(eg, 対象部分)が保護された親オブジェクト(eg, 体全体)を結合機能拡張のために支援する,(3)テンポラル・コンピデンス・プロパゲーション(TCP)は,高信頼度な追跡対象によるフリックリングの検出を回復させる,という3つのモジュールによる双方向の相互の機構を実現する,相乗的フレームワークであるCOVTrack++を提案する。 TAOに関する大規模な実験では、新しいTAAは検証とテストセットで35.4%、30.5%に達し、新しいAssocAは4.8%、新しいLocAは5.8%向上し、BDD100Kでは強力なゼロショット一般化を示している。コードとデータセットが公開される。

論文の概要: COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm

関連論文リスト