Fugu-MT 論文翻訳(概要): Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

論文の概要: Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

arxiv url: http://arxiv.org/abs/2603.24581v1
Date: Wed, 25 Mar 2026 17:56:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.428999
Title: Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving
Title（参考訳）: Latent-WAM: エンド・ツー・エンド自律運転のための潜在世界行動モデリング
Authors: Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, Pengxuan Yang, Yihang Dong, Ce Hao, Xiaoqing Ye, Junyu han, Yifeng Pan, Dongbin Zhao,
Abstract要約: Latent-WAMは効率的なエンドツーエンドの自動運転フレームワークである。空間認識および動的インフォームドされた潜在世界表現を通じて、強力な軌道計画を実現する。
参考スコア（独自算出の注目度）: 40.17041348571413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Latent-WAM, an efficient end-to-end autonomous driving framework that achieves strong trajectory planning through spatially-aware and dynamics-informed latent world representations. Existing world-model-based planners suffer from inadequately compressed representations, limited spatial understanding, and underutilized temporal dynamics, resulting in sub-optimal planning under constrained data and compute budgets. Latent-WAM addresses these limitations with two core modules: a Spatial-Aware Compressive World Encoder (SCWE) that distills geometric knowledge from a foundation model and compresses multi-view images into compact scene tokens via learnable queries, and a Dynamic Latent World Model (DLWM) that employs a causal Transformer to autoregressively predict future world status conditioned on historical visual and motion representations. Extensive experiments on NAVSIM v2 and HUGSIM demonstrate new state-of-the-art results: 89.3 EPDMS on NAVSIM v2 and 28.9 HD-Score on HUGSIM, surpassing the best prior perception-free method by 3.2 EPDMS with significantly less training data and a compact 104M-parameter model.
Abstract（参考訳）: 本研究では、空間認識および動的インフォームドされた潜在世界表現を通じて強力な軌道計画を実現する、効率的なエンドツーエンド自動運転フレームワークであるLatent-WAMを紹介する。既存のワールドモデルベースのプランナーは、不十分に圧縮された表現、限られた空間的理解、未使用の時間的ダイナミクスに悩まされ、制約されたデータや計算予算の下での準最適計画に繋がる。空間対応圧縮型世界エンコーダ(SCWE)は、基礎モデルから幾何学的知識を抽出し、学習可能なクエリを介して複数のビュー画像をコンパクトなシーントークンに圧縮する。 NAVSIM v2 と HUGSIM に関する大規模な実験では、NAVSIM v2 の 89.3 EPDMS と HUGSIM の 28.9 HD-Score の 89.9 EPDMS が、トレーニングデータの少ない 3.2 EPDMS と 104M-parameter モデルに比例した。

論文の概要: Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

関連論文リスト