Fugu-MT 論文翻訳(概要): WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

論文の概要: WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

arxiv url: http://arxiv.org/abs/2509.23402v1
Date: Sat, 27 Sep 2025 16:47:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.2065
Title: WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
Title（参考訳）: WorldSplat: 自律運転のためのガウス中心フィードフォワード4Dシーン生成
Authors: Ziyue Zhu, Zhanqian Wu, Zhenxin Zhu, Lijun Zhou, Haiyang Sun, Bing Wan, Kun Ma, Guang Chen, Hangjun Ye, Jin Xie, jian Yang,
Abstract要約: 4次元駆動シーン生成のための新しいフィードフォワードフレームワークである textbfWorldSplat を提案する。提案手法は,2つのステップで一貫したマルチトラック映像を効果的に生成する。ベンチマークデータセットを用いて行った実験は、textbfWorldSplatが高忠実で時間的に空間的に一貫した新しいビュー駆動ビデオを効果的に生成することを示した。
参考スコア（独自算出の注目度）: 21.778139777889397
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in driving-scene generation and reconstruction have demonstrated significant potential for enhancing autonomous driving systems by producing scalable and controllable training data. Existing generation methods primarily focus on synthesizing diverse and high-fidelity driving videos; however, due to limited 3D consistency and sparse viewpoint coverage, they struggle to support convenient and high-quality novel-view synthesis (NVS). Conversely, recent 3D/4D reconstruction approaches have significantly improved NVS for real-world driving scenes, yet inherently lack generative capabilities. To overcome this dilemma between scene generation and reconstruction, we propose \textbf{WorldSplat}, a novel feed-forward framework for 4D driving-scene generation. Our approach effectively generates consistent multi-track videos through two key steps: ((i)) We introduce a 4D-aware latent diffusion model integrating multi-modal information to produce pixel-aligned 4D Gaussians in a feed-forward manner. ((ii)) Subsequently, we refine the novel view videos rendered from these Gaussians using a enhanced video diffusion model. Extensive experiments conducted on benchmark datasets demonstrate that \textbf{WorldSplat} effectively generates high-fidelity, temporally and spatially consistent multi-track novel view driving videos.
Abstract（参考訳）: 近年のドライブシーン生成と再構築の進歩は、スケーラブルで制御可能なトレーニングデータを作成することにより、自動運転システムを強化する大きな可能性を示している。既存の生成方法は、多種多様な高忠実度駆動ビデオの合成に重点を置いているが、3次元の一貫性が限られ、視界が疎いため、有用で高品質なノベルビュー合成(NVS)をサポートするのに苦労している。逆に、最近の3D/4D再構成アプローチは現実世界の運転シーンにおいてNVSを大幅に改善した。シーン生成と再構成の間のジレンマを克服するために,4次元駆動シーン生成のための新しいフィードフォワードフレームワークである「textbf{WorldSplat}」を提案する。提案手法は,2つの重要なステップを通じて,一貫したマルチトラック映像を効果的に生成する。 (i)マルチモーダル情報を統合した4D対応潜伏拡散モデルを導入し,画素アラインな4Dガウスをフィードフォワード方式で生成する。 () (II) その後, 拡張ビデオ拡散モデルを用いて, これらのガウシアンからレンダリングされた新しいビュー映像を精査する。ベンチマークデータセットを用いて行った大規模な実験により,高忠実度・時間的・空間的に整合性のある複数トラックのビュードライビングビデオが効果的に生成されることが示された。

論文の概要: WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

関連論文リスト