Fugu-MT 論文翻訳(概要): CausalDrive: Real-time Causal World Models for Autonomous Driving

論文の概要: CausalDrive: Real-time Causal World Models for Autonomous Driving

arxiv url: http://arxiv.org/abs/2606.15341v1
Date: Sat, 13 Jun 2026 15:04:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:33.372622
Title: CausalDrive: Real-time Causal World Models for Autonomous Driving
Title（参考訳）: CausalDrive: 自動運転車のリアルタイム因果世界モデル
Authors: Tianyi Yan, Huan Zheng, Dubing Chen, Meizhi Qu, Yingying Shen, Lijun Zhou, Mingfei Tu, Bing Wang, Guang Chen, Hangjun Ye, Haiyang Sun, Cheng-zhong Xu, Jianbing Shen,
Abstract要約: 制御可能でリアルタイムなファンデーション駆動の世界であるCausalDriveを紹介します。 CaulDriveは、最初のフロントビューフレーム、エゴ車両の軌道、マクロテキストプロンプトのみで動作する。本稿では,(1)衝突アーティファクトを著しく緩和した生成的クローズループ評価,(2)ビデオ2Rewardモジュールによる大規模強化学習(RL)後トレーニング,(3)リアルタイムの人間-イン-ザ・ループシミュレーション,の3つのダウンストリームアプリケーションにおける汎用性を実証する。
参考スコア（独自算出の注目度）: 60.66609721457312
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: World models have emerged as a promising paradigm for scaling autonomous driving (AD) data, yet existing video generative models fall short as interactive simulators. Layout-conditioned renderers rely on "oracle" future trajectories of all background agents, rendering them strictly non-reactive. Conversely, pure action-conditioned predictors lack semantic control over complex interactions and suffer from prohibitive diffusion latencies, hindering closed-loop policy learning. To bridge this gap, we present CausalDrive, a controllable, real-time foundation driving world renderer. CausalDrive operates solely on the initial front-view frame, the ego-vehicle's trajectory, and a macroscopic text prompt. By excluding future NPC layouts, we compel the model to intrinsically predict causal interactions, enabling text-driven control over Driving Sociology, allowing users to dynamically orchestrate diverse counterfactual reactions to identical ego-actions. To overcome the efficiency bottleneck and address the covariate shift in autoregressive generation, we propose a novel Context-Forced DMD architecture. This combines continuous flow-matching with a self-correcting distillation objective, achieving interactive speeds of 12 FPS. This breakthrough transforms the passive video generator into a playable neural simulator. We demonstrate its versatility across three downstream applications: (1) generative closed-loop evaluation with significantly mitigated collision artifacts, (2) large-scale Reinforcement Learning (RL) post-training driven by a Video2Reward module, and (3) real-time human-in-the-loop simulation. Extensive experiments validate that policies trained within CausalDrive's reactive scenarios exhibit superior interaction capabilities in the real world.
Abstract（参考訳）: 世界モデルは、自律運転(AD)データをスケールするための有望なパラダイムとして登場したが、既存のビデオ生成モデルは、インタラクティブなシミュレータとして不足している。レイアウト条件付きレンダラーは、すべてのバックグラウンドエージェントの"オークル"将来の軌跡に依存しており、厳密に非反応性である。逆に、純粋な行動条件付き予測器は複雑な相互作用のセマンティックコントロールを欠き、禁止的な拡散遅延に悩まされ、クローズドループポリシー学習を妨げる。このギャップを埋めるために、制御可能でリアルタイムなファンデーション駆動ワールドレンダラーであるCausalDriveを紹介します。 CausalDriveは、最初のフロントビューフレーム、エゴ車両の軌道、マクロテキストプロンプトのみで動作する。将来のNPCレイアウトを除外することで、本モデルでは因果関係を本質的に予測し、運転社会学のテキスト駆動制御を可能にし、ユーザーは同一のエゴアクションに対する多様な反事実反応を動的にオーケストレーションすることができる。効率のボトルネックを克服し、自己回帰生成における共変量シフトに対処するために、新しいContext-Forced DMDアーキテクチャを提案する。これは、連続的なフローマッチングと自己補正蒸留の目的を組み合わせることで、12FPSの対話的な速度を達成する。このブレークスルーは、受動ビデオジェネレータを再生可能なニューラルシミュレータに変換する。本稿では,(1)衝突アーティファクトを著しく緩和した生成的クローズループ評価,(2)ビデオ2Rewardモジュールによる大規模強化学習(RL)後トレーニング,(3)リアルタイムの人間-イン-ザ・ループシミュレーション,の3つのダウンストリームアプリケーションにおける汎用性を実証する。大規模な実験では、CausalDriveのリアクティブシナリオ内でトレーニングされたポリシーが、現実世界で優れたインタラクション能力を示すことが検証されている。

論文の概要: CausalDrive: Real-time Causal World Models for Autonomous Driving

関連論文リスト