Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
- URL: http://arxiv.org/abs/2503.08122v1
- Date: Tue, 11 Mar 2025 07:38:11 GMT
- Title: Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
- Authors: Soonwoo Kwon, Jin-Young Kim, Hyojun Go, Kyungjune Baek,
- Abstract summary: We present a novel study on enhancing the capability of preserving the content in world models, focusing on a property we term World Stability.<n>Recent diffusion-based generative models have advanced the synthesis of immersive and realistic environments that are pivotal for applications such as reinforcement learning and interactive game engines.<n>We introduce an evaluation framework that measures world stability by having world models perform a sequence of actions followed by their inverses to return to their initial viewpoint.
- Score: 9.870616615997973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel study on enhancing the capability of preserving the content in world models, focusing on a property we term World Stability. Recent diffusion-based generative models have advanced the synthesis of immersive and realistic environments that are pivotal for applications such as reinforcement learning and interactive game engines. However, while these models excel in quality and diversity, they often neglect the preservation of previously generated scenes over time--a shortfall that can introduce noise into agent learning and compromise performance in safety-critical settings. In this work, we introduce an evaluation framework that measures world stability by having world models perform a sequence of actions followed by their inverses to return to their initial viewpoint, thereby quantifying the consistency between the starting and ending observations. Our comprehensive assessment of state-of-the-art diffusion-based world models reveals significant challenges in achieving high world stability. Moreover, we investigate several improvement strategies to enhance world stability. Our results underscore the importance of world stability in world modeling and provide actionable insights for future research in this domain.
Related papers
- stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation [46.55784222514516]
We introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem.<n>SWM provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations.<n>We demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
arXiv Detail & Related papers (2026-02-09T18:04:22Z) - From Word to World: Can Large Language Models be Implicit Text-based World Models? [82.47317196099907]
Agentic reinforcement learning increasingly relies on experience-driven scaling.<n>World models offer a potential way to improve learning efficiency through simulated experience.<n>We study whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents.
arXiv Detail & Related papers (2025-12-21T17:28:42Z) - WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World [100.68103378427567]
Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally.<n>We introduce WorldLens, a full-spectrum benchmark evaluating how well a model builds, understands, and behaves within its generated world.<n>We further construct WorldLens-26K, a large-scale dataset of human-annotated videos with numerical scores and textual rationales, and develop WorldLens-Agent.
arXiv Detail & Related papers (2025-12-11T18:59:58Z) - SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments [15.243547292947397]
We introduce the SmallWorld Benchmark, a testbed designed to assess world model capability under isolated and precisely controlled dynamics.<n>We conduct comprehensive experiments in the fully observable state space on representative architectures including Recurrent State Space Model, Transformer, Diffusion model, and Neural ODE.<n>The experimental results reveal how effectively these models capture environment structure and how their predictions deteriorate over extended rollouts.
arXiv Detail & Related papers (2025-11-28T18:56:02Z) - Clone Deterministic 3D Worlds with Geometrically-Regularized World Models [16.494281967592745]
World models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings.<n>Despite rapid progress, current world models remain brittle and degrade over long horizons.<n>We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent representation space.
arXiv Detail & Related papers (2025-10-30T17:56:43Z) - World Models for Cognitive Agents: Transforming Edge Intelligence in Future Networks [55.90051810762702]
We present a comprehensive overview of world models, highlighting their architecture, training paradigms, and applications across prediction, generation, planning, and causal reasoning.<n>We propose Wireless Dreamer, a novel world model-based reinforcement learning framework tailored for wireless edge intelligence optimization.
arXiv Detail & Related papers (2025-05-31T06:43:00Z) - Learning Local Causal World Models with State Space Models and Attention [1.5498250598583487]
We show that a SSM can model the dynamics of a simple environment and learn a causal model at the same time.<n>We pave the way for further experiments that lean into the strength of SSMs and further enhance them with causal awareness.
arXiv Detail & Related papers (2025-05-04T11:57:02Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.
We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models [13.216398753024182]
Large Language Models (LLMs) and Vision-Language Models (VLMs) have become essential to general artificial intelligence.
We propose a novel stability measure for LLMs inspired by statistical methods rooted in information geometry.
Our results demonstrate the utility of our measure in identifying salient parameters and detecting vulnerable regions in input images or critical dimensions in token embeddings.
arXiv Detail & Related papers (2025-03-28T16:23:59Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.
Key idea is to incorporate action information during the pretraining of world models.
We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - A Survey of World Models for Autonomous Driving [63.33363128964687]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>This paper systematically reviews recent advances in world models for autonomous driving.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making [40.53824201182517]
This paper introduces WHALE, a framework for learning generalizable world models.
We present Whale-ST, a scalable spatial-temporal transformer-based world model with enhanced generalizability.
We also propose Whale-X, a 414M parameter world model trained on 970K trajectories from Open X-Embodiment datasets.
arXiv Detail & Related papers (2024-11-08T15:01:27Z) - Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey [61.39993881402787]
World models and video generation are pivotal technologies in the domain of autonomous driving.
This paper investigates the relationship between these two technologies.
By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions.
arXiv Detail & Related papers (2024-11-05T08:58:35Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)
In this survey, we embark on a comprehensive exploration of the latest advancements in world models.
We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models [0.12499537119440243]
A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems.
We propose foundation world models that embed observations into meaningful and causally latent representations.
This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model.
arXiv Detail & Related papers (2024-03-30T20:03:49Z) - The Essential Role of Causality in Foundation World Models for Embodied AI [102.75402420915965]
Embodied AI agents will require the ability to perform new tasks in many different real-world environments.
Current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI.
The study of causality lends itself to the construction of veridical world models.
arXiv Detail & Related papers (2024-02-06T17:15:33Z) - Improving Generative Imagination in Object-Centric World Models [20.495475118576604]
We introduce Generative Structured World Models (G-SWM)
G-SWM unifies the key properties of previous models in a principled framework.
It achieves two crucial new abilities, multimodal uncertainty and situation-awareness.
arXiv Detail & Related papers (2020-10-05T14:43:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.