GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
- URL: http://arxiv.org/abs/2505.22421v2
- Date: Thu, 29 May 2025 12:41:53 GMT
- Title: GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
- Authors: Anthony Chen, Wenzhao Zheng, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Kurt Keutzer, Shanghang Zhang,
- Abstract summary: We introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models.<n>We propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles.<n>Our method significantly outperforms existing models in both action accuracy and 3D spatial awareness.
- Score: 50.67481583744243
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.
Related papers
- AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving [29.420887070252274]
We introduce AD-GS, a novel self-supervised framework for high-quality free-viewpoint rendering of driving scenes from a single log.<n>At its core is a novel learnable motion model that integrates locality-aware B-spline curves with global-aware trigonometric functions.<n>Our model incorporates visibility reasoning and physically rigid regularization to enhance robustness.
arXiv Detail & Related papers (2025-07-16T11:10:57Z) - DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving [20.197094443215963]
We present DriveX, a self-supervised world model that learns general scene dynamics and holistic representations from driving videos.<n>DriveX introduces Omni Scene Modeling (OSM), a module that unifies multimodal supervision-3D point cloud forecasting, 2D semantic representation, and image generation.<n>For downstream adaptation, we design Future Spatial Attention (FSA), a unified paradigm that dynamically aggregates features from DriveX's predictions to enhance task-specific inference.
arXiv Detail & Related papers (2025-05-25T17:27:59Z) - CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving [25.156989992025625]
We introduce a novel spatial adaptive generation framework, CoGen, to achieve controllable multi-view videos with high 3D consistency.<n>By replacing coarse 2D conditions with fine-grained 3D representations, our approach significantly enhances the spatial consistency of the generated videos.<n>Results demonstrate that this method excels in preserving geometric fidelity and visual realism, offering a reliable video generation solution for autonomous driving.
arXiv Detail & Related papers (2025-03-28T08:27:05Z) - Physical Informed Driving World Model [47.04423342994622]
DrivePhysica is an innovative model designed to generate realistic driving videos that adhere to essential physical principles.<n>We achieve state-of-the-art performance in driving video generation quality (3.96 FID and 38.06 FVD on the Nuscenes dataset) and downstream perception tasks.
arXiv Detail & Related papers (2024-12-11T14:29:35Z) - Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model [83.31688383891871]
We propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes.<n>Stag-1 constructs continuous 4D point cloud scenes using surround-view data from autonomous vehicles.<n>It decouples spatial-temporal relationships and produces coherent driving videos.
arXiv Detail & Related papers (2024-12-06T18:59:56Z) - Learning Terrain-Aware Kinodynamic Model for Autonomous Off-Road Rally
Driving With Model Predictive Path Integral Control [4.23755398158039]
We propose a method for learning terrain-aware kinodynamic model conditioned on both proprioceptive and exteroceptive information.
The proposed model generates reliable predictions of 6-degree-of-freedom motion and can even estimate contact interactions.
We demonstrate the effectiveness of our approach through experiments on a simulated off-road track, showing that our proposed model-controller pair outperforms the baseline.
arXiv Detail & Related papers (2023-05-01T06:09:49Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - TrafficBots: Towards World Models for Autonomous Driving Simulation and
Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model.
We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.
Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.