SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments
- URL: http://arxiv.org/abs/2511.23465v1
- Date: Fri, 28 Nov 2025 18:56:02 GMT
- Title: SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments
- Authors: Xinyi Li, Zaishuo Xia, Weyl Lu, Chenjie Hao, Yubei Chen,
- Abstract summary: We introduce the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics.<n>We conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE.<n>The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts.
- Score: 15.243547292947397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current world models lack a unified and controlled setting for systematic evaluation, making it difficult to assess whether they truly capture the underlying rules that govern environment dynamics. In this work, we address this open challenge by introducing the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics without relying on handcrafted reward signals. Using this benchmark, we conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE, examining their behavior across six distinct domains. The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts, highlighting both the strengths and limitations of current modeling paradigms and offering insights into future improvement directions in representation learning and dynamics modeling.
Related papers
- WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models [114.95269118652163]
We introduce WorldArena, a unified benchmark designed to evaluate embodied world models across both perceptual and functional dimensions.<n>WorldArena assesses models through three dimensions: video perception quality, measured with 16 metrics across six sub-dimensions; embodied task functionality, which evaluates world models as data engines, policy evaluators, and action planners integrating with subjective human evaluation.<n>Through extensive experiments on 14 representative models, we reveal a significant perception-functionality gap, showing that high visual quality does not necessarily translate into strong embodied task capability.
arXiv Detail & Related papers (2026-02-09T18:09:20Z) - stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation [46.55784222514516]
We introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem.<n>SWM provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations.<n>We demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
arXiv Detail & Related papers (2026-02-09T18:04:22Z) - Aligning Agentic World Models via Knowledgeable Experience Learning [68.85843641222186]
We introduce WorldMind, a framework that constructs a symbolic World Knowledge Repository by synthesizing environmental feedback.<n>WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.
arXiv Detail & Related papers (2026-01-19T17:33:31Z) - A Step Toward World Models: A Survey on Robotic Manipulation [58.8419978790227]
We look at approaches that exhibit the core capabilities of world models through a review of methods in robotic manipulation.<n>We analyze their roles across perception, prediction, and control, identify key challenges and solutions, and distill the core components, capabilities, and functions that a fully realized world model should possess.
arXiv Detail & Related papers (2025-10-31T00:57:24Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - Adapting World Models with Latent-State Dynamics Residuals [10.892848566977369]
ReDRAW is a latent-state autoregressive world model pretrained in simulation and calibrated to target environments.<n>It enables RL agents to be optimized with imagined rollouts under corrected dynamics and then deployed in the real world.
arXiv Detail & Related papers (2025-04-03T03:41:30Z) - SPARTAN: A Sparse Transformer Learning Local Causation [63.29645501232935]
Causal structures play a central role in world models that flexibly adapt to changes in the environment.
We present the SPARse TrANsformer World model (SPARTAN), a Transformer-based world model that learns local causal structures between entities in a scene.
By applying sparsity regularisation on the attention pattern between object-factored tokens, SPARTAN identifies sparse local causal models that accurately predict future object states.
arXiv Detail & Related papers (2024-11-11T11:42:48Z) - Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models [0.12499537119440243]
A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems.
We propose foundation world models that embed observations into meaningful and causally latent representations.
This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model.
arXiv Detail & Related papers (2024-03-30T20:03:49Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.