Fugu-MT 論文翻訳(概要): Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

論文の概要: Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

arxiv url: http://arxiv.org/abs/2605.18137v2
Date: Tue, 19 May 2026 09:33:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:08.566633
Title: Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving
Title（参考訳）: Xiaomi EV World Model: 自動運転のための再構築と生成を統合したジョイントワールドモデル
Authors: Lijun Zhou, Hongcheng Luo, Zhenxin Zhu, Cheng Chi, Mingfei Tu, Kaixin Xiong, Lei Gong, Zhanqian Wu, Zehan Zhang, Fangzhen Li, Hao Li, Yingying Shen, Jiale He, Haohui Zhu, Shan Zhao, Kai Wang, Zhiwei Zhan, Yuechuan Pu, Kaiyuan Tan, Ruiling Yang, Xianqi Wang, Tianyi Yan, Jiawei Zhou, Lei Zhang, Jingyang Zhao, Xi Zhou, Chitian Sun, Chenming Wu, Jiong Deng, Hongwei Xie, Ming Lu, Kun Ma, Long Chen, Guang Chen, Hangjun Ye, Bing Wang, Haiyang Sun,
Abstract要約: 本報告では,世界モデルの自律運転における2つのコア機能に対処する統合技術システムを提案する。世界表現のために,スパースシーンクエリによって駆動されるフィードフォワード再構築アーキテクチャであるWorldRecを提案する。次世代に向けて,両方向性事前学習のための2段階のトレーニングフレームワークWorldGenを提案し,それに続いて因果微調整を行う。
参考スコア（独自算出の注目度）: 51.90209659403234
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This report presents a unified technical system addressing the two core capabilities of world models for autonomous driving: world representation and world generation. For world representation, we propose WorldRec, a feed-forward reconstruction architecture driven by sparse scene queries. WorldRec initializes structured queries in 3D space, leveraging them to aggregate cross-view, cross-temporal features, thereby naturally enforcing spatial consistency across frames and yielding compact yet high-fidelity 3D Gaussian scene representations. For world generation, we propose WorldGen, a two-stage training framework of bidirectional pretraining followed by causal fine-tuning through three progressive stages (Teacher Forcing, ODE distillation, and DMD), enabling high-quality online causal video generation in as few as 4 denoising steps. Building on both modules, we further introduce the JWM, which deeply integrates WorldRec and WorldGen to achieve synergistic gains in generation stability, cross-frame consistency, and visual fidelity, providing a solid foundation for closed-loop simulation, data synthesis, and end-to-end training in autonomous driving.
Abstract（参考訳）: 本報告では、自律運転における世界モデルの2つのコア機能である世界表現と世界生成に対処する統合技術システムを提案する。世界表現のために,スパースシーンクエリによって駆動されるフィードフォワード再構築アーキテクチャであるWorldRecを提案する。 WorldRecは、構造化クエリを3D空間で初期化し、それを利用して、クロスビュー、クロスタイムな特徴を集約し、フレーム間の空間的一貫性を自然に強制し、コンパクトで高忠実な3Dガウスのシーン表現を生成する。先進的な3段階(Teacher Forcing, ODE蒸留, DMD)を経た双方向事前訓練の2段階トレーニングフレームワークWorldGenを提案し, 高品質なオンライン因果ビデオ生成を実現する。両モジュール上に構築するJWMは,生成安定性,クロスフレーム一貫性,視覚的忠実性などの相乗的向上を実現するために,WorldRecとWorldGenを深く統合した上で,クローズドループシミュレーション,データ合成,自律運転におけるエンドツーエンドトレーニングの基盤となる。

論文の概要: Xiaomi EV World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving

関連論文リスト