Fugu-MT 論文翻訳(概要): Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models

論文の概要: Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models

arxiv url: http://arxiv.org/abs/2510.16729v2
Date: Wed, 29 Oct 2025 06:53:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 18:06:01.956187
Title: Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models
Title（参考訳）: 実世界モデルによる視覚中心型4次元作業予測と計画
Authors: Jianbiao Mei, Yu Yang, Xuemeng Yang, Licheng Wen, Jiajun Lv, Botian Shi, Yong Liu,
Abstract要約: Implicit Residual World Modelは、世界の現在の状態と進化をモデル化することに焦点を当てている。 IR-WMは4次元占有予測と軌道計画の両方において最高性能を達成する。
参考スコア（独自算出の注目度）: 28.777224599594717
License: http://creativecommons.org/licenses/by/4.0/
Abstract: End-to-end autonomous driving systems increasingly rely on vision-centric world models to understand and predict their environment. However, a common ineffectiveness in these models is the full reconstruction of future scenes, which expends significant capacity on redundantly modeling static backgrounds. To address this, we propose IR-WM, an Implicit Residual World Model that focuses on modeling the current state and evolution of the world. IR-WM first establishes a robust bird's-eye-view representation of the current state from the visual observation. It then leverages the BEV features from the previous timestep as a strong temporal prior and predicts only the "residual", i.e., the changes conditioned on the ego-vehicle's actions and scene context. To alleviate error accumulation over time, we further apply an alignment module to calibrate semantic and dynamic misalignments. Moreover, we investigate different forecasting-planning coupling schemes and demonstrate that the implicit future state generated by world models substantially improves planning accuracy. On the nuScenes benchmark, IR-WM achieves top performance in both 4D occupancy forecasting and trajectory planning.
Abstract（参考訳）: エンドツーエンドの自動運転システムは、その環境を理解し予測するために、視覚中心の世界モデルに依存している。しかし、これらのモデルで一般的な非効率性は将来のシーンの完全な再構築であり、静的な背景を冗長にモデル化する上でかなりの能力を持つ。そこで本研究では,世界の現状と進化をモデル化することに焦点を当てたImplicit Residual World Model IR-WMを提案する。 IR-WMはまず、視覚観測から現在の状態の頑健な鳥の目視表現を確立する。次に、前回の時間ステップのBEV特徴を強い時間的先行として活用し、エゴ車両の行動とシーンコンテキストに規定された変化を「残留」のみを予測する。時間の経過とともにエラーの蓄積を緩和するため、意味的および動的ミスアライメントを調整するためのアライメントモジュールを更に適用する。さらに,異なる予測計画結合方式について検討し,世界モデルが生成する暗黙の将来の状態が計画精度を大幅に向上することを示す。 nuScenesベンチマークでは、IR-WMは4D占有率予測と軌道計画の両方でトップパフォーマンスを達成する。

論文の概要: Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models

関連論文リスト