ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
- URL: http://arxiv.org/abs/2602.10884v1
- Date: Wed, 11 Feb 2026 14:12:26 GMT
- Title: ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
- Authors: Jinqing Zhang, Zehua Fu, Zelin Xu, Wenying Dai, Qingjie Liu, Yunhong Wang,
- Abstract summary: We propose Temporal Residual World Model (TR-World), which focuses on dynamic object modeling.<n>By calculating the temporal residuals of scene representations, the information of dynamic objects can be extracted without relying on detection and tracking.<n>We also propose Future-Guided Trajectory Refinement (FGTR) module, which conducts interaction between prior trajectories and the future BEV features.
- Score: 40.28153843744977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The comprehensive understanding capabilities of world models for driving scenarios have significantly improved the planning accuracy of end-to-end autonomous driving frameworks. However, the redundant modeling of static regions and the lack of deep interaction with trajectories hinder world models from exerting their full effectiveness. In this paper, we propose Temporal Residual World Model (TR-World), which focuses on dynamic object modeling. By calculating the temporal residuals of scene representations, the information of dynamic objects can be extracted without relying on detection and tracking. TR-World takes only temporal residuals as input, thus predicting the future spatial distribution of dynamic objects more precisely. By combining the prediction with the static object information contained in the current BEV features, accurate future BEV features can be obtained. Furthermore, we propose Future-Guided Trajectory Refinement (FGTR) module, which conducts interaction between prior trajectories (predicted from the current scene representation) and the future BEV features. This module can not only utilize future road conditions to refine trajectories, but also provides sparse spatial-temporal supervision on future BEV features to prevent world model collapse. Comprehensive experiments conducted on the nuScenes and NAVSIM datasets demonstrate that our method, namely ResWorld, achieves state-of-the-art planning performance. The code is available at https://github.com/mengtan00/ResWorld.git.
Related papers
- Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models [28.777224599594717]
Implicit Residual World Model focuses on modeling the current state and evolution of the world.<n> IR-WM achieves top performance in both 4D occupancy forecasting and trajectory planning.
arXiv Detail & Related papers (2025-10-19T06:45:37Z) - End-to-End Driving with Online Trajectory Evaluation via BEV World Model [52.10633338584164]
We propose an end-to-end driving framework WoTE, which leverages a BEV World model to predict future BEV states for Trajectory Evaluation.<n>We validate our framework on the NAVSIM benchmark and the closed-loop Bench2Drive benchmark based on the CARLA simulator, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-04-02T17:47:23Z) - An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training [50.71892161377806]
DFIT-OccWorld is an efficient 3D occupancy world model that leverages decoupled dynamic flow and image-assisted training strategy.<n>Our model forecasts future dynamic voxels by warping existing observations using voxel flow, whereas static voxels are easily obtained through pose transformation.
arXiv Detail & Related papers (2024-12-18T12:10:33Z) - ACT-Bench: Towards Action Controllable World Models for Autonomous Driving [2.6749009435602122]
World models have emerged as promising neural simulators for autonomous driving.<n>We develop an open-access evaluation framework, ACT-Bench, for quantifying action fidelity.<n>We demonstrate that the state-of-the-art model does not fully adhere to given instructions, while Terra achieves improved action fidelity.
arXiv Detail & Related papers (2024-12-06T01:06:28Z) - Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving [15.100104512786107]
Drive-OccWorld adapts a visioncentric- 4D forecasting world model to end-to-end planning for autonomous driving.<n>We propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation.<n>Our method can generate plausible and controllable 4D occupancy, paving the way for advancements in driving world generation and end-to-end planning.
arXiv Detail & Related papers (2024-08-26T11:53:09Z) - Physics-guided Active Sample Reweighting for Urban Flow Prediction [75.24539704456791]
Urban flow prediction is a nuanced-temporal modeling that estimates the throughput of transportation services like buses, taxis and ride-driven models.
Some recent prediction solutions bring remedies with the notion of physics-guided machine learning (PGML)
We develop a atized physics-guided network (PN), and propose a data-aware framework Physics-guided Active Sample Reweighting (P-GASR)
arXiv Detail & Related papers (2024-07-18T15:44:23Z) - UnO: Unsupervised Occupancy Fields for Perception and Forecasting [33.205064287409094]
Supervised approaches leverage annotated object labels to learn a model of the world.
We learn to perceive and forecast a continuous 4D occupancy field with self-supervision from LiDAR data.
This unsupervised world model can be easily and effectively transferred to tasks.
arXiv Detail & Related papers (2024-06-12T23:22:23Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z) - Predicting Future Occupancy Grids in Dynamic Environment with
Spatio-Temporal Learning [63.25627328308978]
We propose a-temporal prediction network pipeline to generate future occupancy predictions.
Compared to current SOTA, our approach predicts occupancy for a longer horizon of 3 seconds.
We publicly release our grid occupancy dataset based on nulis to support further research.
arXiv Detail & Related papers (2022-05-06T13:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.