From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
- URL: http://arxiv.org/abs/2510.19654v1
- Date: Wed, 22 Oct 2025 14:57:51 GMT
- Title: From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
- Authors: Zhida Zhao, Talas Fu, Yifan Wang, Lijun Wang, Huchuan Lu,
- Abstract summary: We introduce a new driving paradigm named Policy World Model (PWM)<n>PWM integrates world modeling and trajectory planning within a unified architecture.<n>Our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs.
- Score: 57.56072009935036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite remarkable progress in driving world models, their potential for autonomous systems remains largely untapped: the world models are mostly learned for world simulation and decoupled from trajectory planning. While recent efforts aim to unify world modeling and planning in a single framework, the synergistic facilitation mechanism of world modeling for planning still requires further exploration. In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not only integrates world modeling and trajectory planning within a unified architecture, but is also able to benefit planning using the learned world knowledge through the proposed action-free future state forecasting scheme. Through collaborative state-action prediction, PWM can mimic the human-like anticipatory perception, yielding more reliable planning performance. To facilitate the efficiency of video forecasting, we further introduce a dynamically enhanced parallel token generation mechanism, equipped with a context-guided tokenizer and an adaptive dynamic focal loss. Despite utilizing only front camera input, our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs. Code and model weights will be released at https://github.com/6550Zhao/Policy-World-Model.
Related papers
- Co-Evolving Latent Action World Models [57.48921576959243]
Adapting pre-trained video models into controllable world models via latent actions is a promising step towards creating generalist world models.<n>We propose CoLA-World, which for the first time successfully realizes this synergistic paradigm.<n>This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM.
arXiv Detail & Related papers (2025-10-30T12:28:40Z) - WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning [52.36434784963598]
We introduce WorldPrediction, a video-based benchmark for evaluating world modeling and procedural planning capabilities of different AI models.<n>We show that current frontier models barely achieve 57% accuracy on WorldPrediction-WM and 38% on WorldPrediction-PP whereas humans are able to solve both tasks perfectly.
arXiv Detail & Related papers (2025-06-04T18:22:40Z) - A Survey of World Models for Autonomous Driving [55.520179689933904]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>Future research must address key challenges in self-supervised representation learning, multimodal fusion, and advanced simulation.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers [61.92571851411509]
We introduce a multimodal driving language based on interleaved image and action tokens, and develop DrivingGPT to learn joint world modeling and planning.<n>Our DrivingGPT demonstrates strong performance in both action-conditioned video generation and end-to-end planning, outperforming strong baselines on large-scale nuPlan and NAVSIM benchmarks.
arXiv Detail & Related papers (2024-12-24T18:59:37Z) - Making Large Language Models into World Models with Precondition and Effect Knowledge [1.8561812622368763]
We show that Large Language Models (LLMs) can be induced to perform two critical world model functions.
We validate that the precondition and effect knowledge generated by our models aligns with human understanding of world dynamics.
arXiv Detail & Related papers (2024-09-18T19:28:04Z) - World Models via Policy-Guided Trajectory Diffusion [21.89154719069519]
Existing world models are autoregressive in that they interleave predicting the next state with sampling the next action from the policy.
We propose a novel world modelling approach that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model.
arXiv Detail & Related papers (2023-12-13T21:46:09Z) - Driving into the Future: Multiview Visual Forecasting and Planning with
World Model for Autonomous Driving [56.381918362410175]
Drive-WM is the first driving world model compatible with existing end-to-end planning models.
Our model generates high-fidelity multiview videos in driving scenes.
arXiv Detail & Related papers (2023-11-29T18:59:47Z) - Evolutionary Planning in Latent Space [7.863826008567604]
Planning is a powerful approach to reinforcement learning with several desirable properties.
We learn a world model that enables Evolutionary Planning in Latent Space.
We show how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning policy.
arXiv Detail & Related papers (2020-11-23T09:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.