Fugu-MT 論文翻訳(概要): NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

論文の概要: NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

arxiv url: http://arxiv.org/abs/2603.06254v1
Date: Fri, 06 Mar 2026 13:12:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 13:17:45.786847
Title: NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving
Title（参考訳）: NOVA: 自律運転における3次元多物体追跡のための次世代オープンボキャブラリオートレグレス
Authors: Kai Luo, Xu Wang, Rui Fan, Kailun Yang,
Abstract要約: 未知のターゲットをまたいだ一般化は、オープンワールドの認識にとって重要である。次ステップのOpen-Vabulary Autoregression (NOVA)は、3Dトラッキングを断片化された距離ベースマッチングから従来のセマンティックモデリングへシフトする。
参考スコア（独自算出の注目度）: 16.99502075851124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalizing across unknown targets is critical for open-world perception, yet existing 3D Multi-Object Tracking (3D MOT) pipelines remain limited by closed-set assumptions and ``semantic-blind'' heuristics. To address this, we propose Next-step Open-Vocabulary Autoregression (NOVA), an innovative paradigm that shifts 3D tracking from traditional fragmented distance-based matching toward generative spatio-temporal semantic modeling. NOVA reformulates 3D trajectories as structured spatio-temporal semantic sequences, enabling the simultaneous encoding of physical motion continuity and deep linguistic priors. By leveraging the autoregressive capabilities of Large Language Models (LLMs), we transform the tracking task into a principled process of next-step sequence completion. This mechanism allows the model to explicitly utilize the hierarchical structure of language space to resolve fine-grained semantic ambiguities and maintain identity consistency across complex long-range sequences through high-level commonsense reasoning. Extensive experiments on nuScenes, V2X-Seq-SPD, and KITTI demonstrate the superior performance of NOVA. Notably, on the nuScenes dataset, NOVA achieves an AMOTA of 22.41% for Novel categories, yielding a significant 20.21% absolute improvement over the baseline. These gains are realized through a compact 0.5B autoregressive model. Code will be available at https://github.com/xifen523/NOVA.
Abstract（参考訳）: 未知のターゲットをまたいで一般化することは、オープンワールドの認識にとって重要であるが、既存の3次元多目的追跡(3D MOT)パイプラインは、クローズドセットの仮定と 'semantic-blind''' ヒューリスティックスによって制限されている。そこで我々は,従来の断片化距離に基づくマッチングから生成時空間的セマンティックモデリングへ3次元追跡をシフトさせる革新的なパラダイムであるNOVA(Next-step Open-Vocabulary Autoregression)を提案する。 NOVAは3次元軌跡を構造化時空間意味配列として再構成し、物理運動連続性と深い言語的先行性の同時符号化を可能にする。本研究では,Large Language Models (LLMs) の自己回帰機能を活用することで,追跡タスクを次のステップのシーケンス完了の原理的なプロセスに変換する。このメカニズムにより、モデルは言語空間の階層構造を明示的に利用し、細粒度のセマンティックな曖昧さを解消し、高レベルなコモンセンス推論を通じて複雑な長距離シーケンス間のアイデンティティ一貫性を維持することができる。 nuScenes, V2X-Seq-SPD, KITTIの広範囲な実験によりNOVAの優れた性能が示された。特に nuScenes データセットでは、NOVA は新規カテゴリに対して 22.41% の AMOTA を達成し、ベースラインに対して 20.21% の絶対的な改善をもたらす。これらの利得は、コンパクトな0.5B自己回帰モデルによって実現される。コードはhttps://github.com/xifen523/NOVA.comから入手できる。

論文の概要: NOVA: Next-step Open-Vocabulary Autoregression for 3D Multi-Object Tracking in Autonomous Driving

関連論文リスト