Fugu-MT 論文翻訳(概要): ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

論文の概要: ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

arxiv url: http://arxiv.org/abs/2508.11428v1
Date: Fri, 15 Aug 2025 12:06:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-18 14:51:23.943093
Title: ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving
Title（参考訳）: ImagiDrive: 自動運転のための一貫したImagination-and-Planningフレームワーク
Authors: Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, Li Zhang,
Abstract要約: ビジョン・ランゲージ・モデル(VLM)とドライビング・ワールド・モデル(DWM)は、この課題のさまざまな側面に対処する強力なレシピとして独立して登場した。我々は、VLMベースの運転エージェントとDWMベースのシーン想像装置を統合した、新しいエンドツーエンドの自動運転フレームワークであるImagiDriveを提案する。
参考スコア（独自算出の注目度）: 64.12414815634847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Autonomous driving requires rich contextual comprehension and precise predictive reasoning to navigate dynamic and complex environments safely. Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge. VLMs provide interpretability and robust action prediction through their ability to understand multi-modal context, while DWMs excel in generating detailed and plausible future driving scenarios essential for proactive planning. Integrating VLMs with DWMs is an intuitive, promising, yet understudied strategy to exploit the complementary strengths of accurate behavioral prediction and realistic scene generation. Nevertheless, this integration presents notable challenges, particularly in effectively connecting action-level decisions with high-fidelity pixel-level predictions and maintaining computational efficiency. In this paper, we propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer to form a unified imagination-and-planning loop. The driving agent predicts initial driving trajectories based on multi-modal inputs, guiding the scene imaginer to generate corresponding future scenarios. These imagined scenarios are subsequently utilized to iteratively refine the driving agent's planning decisions. To address efficiency and predictive accuracy challenges inherent in this integration, we introduce an early stopping mechanism and a trajectory selection strategy. Extensive experimental validation on the nuScenes and NAVSIM datasets demonstrates the robustness and superiority of ImagiDrive over previous alternatives under both open-loop and closed-loop conditions.
Abstract（参考訳）: 自律運転は、ダイナミックで複雑な環境を安全にナビゲートするために、コンテキスト理解と正確な予測推論を必要とする。ビジョン・ランゲージ・モデル(VLM)とドライビング・ワールド・モデル(DWM)は、この課題のさまざまな側面に対処する強力なレシピとして独立して登場した。 VLMは、多モードコンテキストを理解する能力を通じて、解釈可能性と堅牢なアクション予測を提供する一方、DWMは、プロアクティブプランニングに不可欠な詳細かつ妥当な将来の駆動シナリオを生成するのに優れている。 VLMをDWMと統合することは、正確な行動予測と現実的なシーン生成の相補的な強みを利用するための直感的で、有望だが、未検討の戦略である。しかしながら、この統合は、特にアクションレベルの決定を高忠実度ピクセルレベルの予測と効果的に結合し、計算効率を維持する上で、顕著な課題を呈している。本稿では,VLMをベースとした運転エージェントとDWMをベースとしたシーンデザイナを統合し,統一的なイマジネーション・アンド・プランニングループを形成する,新しいエンド・ツー・エンドの自動運転フレームワークであるImagiDriveを提案する。駆動エージェントは、マルチモーダル入力に基づいて初期駆動軌跡を予測し、シーンデマを誘導して、対応する将来のシナリオを生成する。これらのシナリオはその後、ドライブエージェントの計画決定を反復的に洗練するために利用される。この統合に固有の効率性と予測精度の課題に対処するために,早期停止機構と軌道選択戦略を導入する。 nuScenesとNAVSIMデータセットに対する大規模な実験的検証は、オープンループとクローズループの両方条件下での従来の代替よりも、ImagiDriveの堅牢性と優位性を示している。

論文の概要: ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

関連論文リスト