From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models
- URL: http://arxiv.org/abs/2601.15533v1
- Date: Wed, 21 Jan 2026 23:35:33 GMT
- Title: From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models
- Authors: Zhikang Chen, Tingting Zhu,
- Abstract summary: A world model is an AI system that simulates how an environment evolves under actions.<n>Current world models suffer from visual conflation: the mistaken assumption that high-fidelity video generation implies an understanding of physical and causal dynamics.<n>We show that while modern models excel at predicting pixels, they frequently violate invariant constraints, fail under intervention, and break down in safety-critical decision-making.
- Score: 4.52033729546524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A world model is an AI system that simulates how an environment evolves under actions, enabling planning through imagined futures rather than reactive perception. Current world models, however, suffer from visual conflation: the mistaken assumption that high-fidelity video generation implies an understanding of physical and causal dynamics. We show that while modern models excel at predicting pixels, they frequently violate invariant constraints, fail under intervention, and break down in safety-critical decision-making. This survey argues that visual realism is an unreliable proxy for world understanding. Instead, effective world models must encode causal structure, respect domain-specific constraints, and remain stable over long horizons. We propose a reframing of world models as actionable simulators rather than visual engines, emphasizing structured 4D interfaces, constraint-aware dynamics, and closed-loop evaluation. Using medical decision-making as an epistemic stress test, where trial-and-error is impossible and errors are irreversible, we demonstrate that a world model's value is determined not by how realistic its rollouts appear, but by its ability to support counterfactual reasoning, intervention planning, and robust long-horizon foresight.
Related papers
- Mirage2Matter: A Physically Grounded Gaussian World Model from Video [87.9732484393686]
We present Simulate Anything, a graphics-driven world modeling and simulation framework.<n>Our approach reconstructs real-world environments into a photorealistic scene representation using 3D Gaussian Splatting (3DGS)<n>We then leverage generative models to recover a physically realistic representation and integrate it into a simulation environment via a precision calibration target.
arXiv Detail & Related papers (2026-01-24T07:43:57Z) - Walk through Paintings: Egocentric World Models from Internet Priors [65.30611174953958]
We present the Egocentric World Model (EgoWM), a simple, architecture-agnostic method that transforms any pretrained video diffusion model into an action-conditioned world model.<n>Rather than training from scratch, we repurpose the rich world priors of Internet-scale video models and inject motor commands through lightweight conditioning layers.<n>Our approach scales naturally across embodiments and action spaces, ranging from 3-DoF mobile robots to 25-DoF humanoids.
arXiv Detail & Related papers (2026-01-21T18:59:32Z) - WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World [100.68103378427567]
Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally.<n>We introduce WorldLens, a full-spectrum benchmark evaluating how well a model builds, understands, and behaves within its generated world.<n>We further construct WorldLens-26K, a large-scale dataset of human-annotated videos with numerical scores and textual rationales, and develop WorldLens-Agent.
arXiv Detail & Related papers (2025-12-11T18:59:58Z) - Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model [12.257547810949482]
Embodied Tree of Thoughts (EToT) is a novel Real2Sim2Real planning framework.<n>EToT formulates manipulation planning as a tree search expanded through two synergistic mechanisms.<n>By grounding high-level reasoning in a physics simulator, our framework ensures that generated plans adhere to rigid-body dynamics and collision constraints.
arXiv Detail & Related papers (2025-12-09T02:36:26Z) - PAN: A World Model for General, Interactable, and Long-Horizon World Simulation [49.805071498152536]
We introduce PAN, a general, interactable, and long-horizon world model.<n>It predicts future world states through high-quality video simulation conditioned on history and natural language actions.<n>Experiments show that PAN achieves strong performance in action-conditioned world simulation, long-horizon forecasting, and simulative reasoning.
arXiv Detail & Related papers (2025-11-12T07:20:35Z) - Clone Deterministic 3D Worlds with Geometrically-Regularized World Models [16.494281967592745]
World models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings.<n>Despite rapid progress, current world models remain brittle and degrade over long horizons.<n>We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent representation space.
arXiv Detail & Related papers (2025-10-30T17:56:43Z) - A Comprehensive Survey on World Models for Embodied AI [14.457261562275121]
Embodied AI requires agents that perceive, act, and anticipate how actions reshape future world states.<n>This survey presents a unified framework for world models in embodied AI.
arXiv Detail & Related papers (2025-10-19T07:12:32Z) - WoW: Towards a World omniscient World model Through Embodied Interaction [83.43543124512719]
Authentic physical intuition of the world model must be grounded in extensive, causally rich interactions with the real world.<n>We present WoW, a generative world model trained on 2 million robot interaction trajectories.<n>We establish WoWBench, a new benchmark focused on physical consistency and causal reasoning in video.
arXiv Detail & Related papers (2025-09-26T17:59:07Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - Causal World Models by Unsupervised Deconfounding of Physical Dynamics [20.447000858907646]
The capability of imagining internally with a mental model of the world is vitally important for human cognition.
We propose Causal World Models (CWMs) that allow unsupervised modeling of relationships between the intervened and alternative futures.
We show reductions in complexity sample for reinforcement learning tasks and improvements in counterfactual physical reasoning.
arXiv Detail & Related papers (2020-12-28T13:44:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.