D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow
- URL: http://arxiv.org/abs/2411.17027v1
- Date: Tue, 26 Nov 2024 01:42:49 GMT
- Title: D$^2$-World: An Efficient World Model through Decoupled Dynamic Flow
- Authors: Haiming Zhang, Xu Yan, Ying Xue, Zixuan Guo, Shuguang Cui, Zhen Li, Bingbing Liu,
- Abstract summary: This report summarizes the second-place solution for the Predictive World Model Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous Systems.
We introduce D$2$-World, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow.
Our approach achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model.
- Score: 47.361822281431586
- License:
- Abstract: This technical report summarizes the second-place solution for the Predictive World Model Challenge held at the CVPR-2024 Workshop on Foundation Models for Autonomous Systems. We introduce D$^2$-World, a novel World model that effectively forecasts future point clouds through Decoupled Dynamic flow. Specifically, the past semantic occupancies are obtained via existing occupancy networks (e.g., BEVDet). Following this, the occupancy results serve as the input for a single-stage world model, generating future occupancy in a non-autoregressive manner. To further simplify the task, dynamic voxel decoupling is performed in the world model. The model generates future dynamic voxels by warping the existing observations through voxel flow, while remaining static voxels can be easily obtained through pose transformation. As a result, our approach achieves state-of-the-art performance on the OpenScene Predictive World Model benchmark, securing second place, and trains more than 300% faster than the baseline model. Code is available at https://github.com/zhanghm1995/D2-World.
Related papers
- Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles [9.639797094021988]
We introduce Imagine-2-Drive, a framework that consists of two components, VISTAPlan and DPA.
DPA is a diffusion based policy to model multi-modal behaviors for trajectory prediction.
We significantly outperform the state of the art (SOTA) world models on standard driving metrics by 15% and 20% on Route Completion and Success Rate respectively.
arXiv Detail & Related papers (2024-11-15T13:17:54Z) - WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making [40.53824201182517]
This paper introduces WHALE, a framework for learning generalizable world models.
We present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability.
We also propose Whale-X, a 414M parameter world model trained on 970K trajectories from Open X-Embodiment datasets.
arXiv Detail & Related papers (2024-11-08T15:01:27Z) - DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model [14.996395953240699]
DOME is a diffusion-based world model that predicts future occupancy frames based on past occupancy observations.
The ability of this world model to capture the evolution of the environment is crucial for planning in autonomous driving.
arXiv Detail & Related papers (2024-10-14T12:24:32Z) - Masked Generative Priors Improve World Models Sequence Modelling Capabilities [19.700020499490137]
Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling.
GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark.
We apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research.
arXiv Detail & Related papers (2024-10-10T11:52:07Z) - AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction [56.72301849123049]
We present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ dataset challenge at CVPR 2024.
Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling.
Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth.
arXiv Detail & Related papers (2024-07-01T16:32:15Z) - EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping.
Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs.
We introduce a novel generative modeling framework grounded in textbfphase space dynamics
Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.