Fugu-MT 論文翻訳(概要): RAYNOVA: 3D-Geometry-Free Auto-Regressive Driving World Modeling with Unified Spatio-Temporal Representation

論文の概要: RAYNOVA: 3D-Geometry-Free Auto-Regressive Driving World Modeling with Unified Spatio-Temporal Representation

arxiv url: http://arxiv.org/abs/2602.20685v1
Date: Tue, 24 Feb 2026 08:41:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-25 17:34:53.679443
Title: RAYNOVA: 3D-Geometry-Free Auto-Regressive Driving World Modeling with Unified Spatio-Temporal Representation
Title（参考訳）: RAYNOVA: 統一時空間表現を用いた3次元ジオメトリフリー自動回帰駆動世界モデリング
Authors: Yichen Xie, Chensheng Peng, Mazen Abdelfattah, Yihan Hu, Jiezhi Yang, Eric Higgins, Ryan Brigden, Masayoshi Tomizuka, Wei Zhan,
Abstract要約: RAYNOVAは、二重因果自己回帰フレームワークを用いた幾何学的自由世界モデルである。私たちのコードはhttp://yichen.io/raynova.comでリリースされます。
参考スコア（独自算出の注目度）: 51.441415833480505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: World foundation models aim to simulate the evolution of the real world with physically plausible behavior. Unlike prior methods that handle spatial and temporal correlations separately, we propose RAYNOVA, a geometry-free world model that employs a dual-causal autoregressive framework. It follows both scale-wise and temporal topological orders in the autoregressive process, and leverages global attention for unified 4D spatio-temporal reasoning. Different from existing works that impose strong 3D geometric priors, RAYNOVA constructs an isotropic spatio-temporal representation across views, frames, and scales based on relative Plücker-ray positional encoding, enabling robust generalization to diverse camera setups and ego motions. We further introduce a recurrent training paradigm to alleviate distribution drift in long-horizon video generation. RAYNOVA achieves state-of-the-art multi-view video generation results on nuScenes, while offering higher throughput and strong controllability under diverse input conditions, generalizing to novel views and camera configurations without explicit 3D scene representation. Our code will be released at http://yichen928.github.io/raynova.
Abstract（参考訳）: 世界基盤モデルは、物理的に妥当な振る舞いで現実世界の進化をシミュレートすることを目的としている。空間的相関と時間的相関を別々に扱う従来の手法とは異なり、二重因果自己回帰フレームワークを用いた幾何学的自由世界モデルであるRAYNOVAを提案する。自己回帰過程におけるスケールワイドおよび時間的トポロジカルな順序に従い、グローバルな注意を4次元時空間的推論に活用する。 RAYNOVAは、強い3次元幾何学的前提を課す既存の作品とは異なり、相対的なプリュッカー線位置符号化に基づいて、ビュー、フレーム、スケールをまたいだ等方的時空間表現を構築し、多様なカメラ設定やエゴ運動への堅牢な一般化を可能にしている。さらに,長距離ビデオ生成における分布のドリフトを軽減するために,繰り返しトレーニングパラダイムを導入する。 RAYNOVAは、nuScenes上で最先端のマルチビュービデオ生成結果を実現し、多様な入力条件下で高いスループットと強力な制御性を提供し、明示的な3Dシーン表現のない新しいビューやカメラ構成に一般化する。私たちのコードはhttp://yichen928.github.io/raynova.comでリリースされます。

論文の概要: RAYNOVA: 3D-Geometry-Free Auto-Regressive Driving World Modeling with Unified Spatio-Temporal Representation

関連論文リスト