From Masks to Worlds: A Hitchhiker's Guide to World Models
- URL: http://arxiv.org/abs/2510.20668v1
- Date: Thu, 23 Oct 2025 15:46:44 GMT
- Title: From Masks to Worlds: A Hitchhiker's Guide to World Models
- Authors: Jinbin Bai, Yu Lei, Hecong Wu, Yuchen Zhu, Shufan Li, Yi Xin, Xiangtai Li, Molei Tao, Aditya Grover, Ming-Hsuan Yang,
- Abstract summary: This is not a typical survey of world models; it is a guide for those who want to build worlds.<n>We do not aim to catalog every paper that has ever mentioned a world model'
- Score: 97.94109752910457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This is not a typical survey of world models; it is a guide for those who want to build worlds. We do not aim to catalog every paper that has ever mentioned a ``world model". Instead, we follow one clear road: from early masked models that unified representation learning across modalities, to unified architectures that share a single paradigm, then to interactive generative models that close the action-perception loop, and finally to memory-augmented systems that sustain consistent worlds over time. We bypass loosely related branches to focus on the core: the generative heart, the interactive loop, and the memory system. We show that this is the most promising path towards true world models.
Related papers
- DreamWorld: Unified World Modeling in Video Generation [32.857497363728584]
We introduce textbfDreamWorld, a unified framework that integrates complementary world knowledge into video generators.<n>We show that DreamWorld improves world consistency, outperforming Wan2.1 by 2.26 points on VBench.
arXiv Detail & Related papers (2026-02-28T05:02:39Z) - Co-Evolving Latent Action World Models [57.48921576959243]
Adapting pre-trained video models into controllable world models via latent actions is a promising step towards creating generalist world models.<n>We propose CoLA-World, which for the first time successfully realizes this synergistic paradigm.<n>This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM.
arXiv Detail & Related papers (2025-10-30T12:28:40Z) - Can World Models Benefit VLMs for World Dynamics? [59.73433292793044]
We investigate the capabilities when world model priors are transferred into Vision-Language Models.<n>We name our best-performing variant Dynamic Vision Aligner (DyVA)<n>We find DyVA to surpass both open-source and proprietary baselines, achieving state-of-the-art or comparable performance.
arXiv Detail & Related papers (2025-10-01T13:07:05Z) - PoE-World: Compositional World Modeling with Products of Programmatic Experts [50.35012247866856]
Learning how the world works is central to building AI agents that can adapt to complex environments.<n>Recent advances in program synthesis using Large Language Models (LLMs) give an alternate approach which learns world models represented as source code.<n>We show that this approach can learn complex world models from just a few observations. We evaluate the learned world models by embedding them in a model-based planning agent, demonstrating efficient performance and generalization to unseen levels on Atari's Pong and Montezuma's Revenge.
arXiv Detail & Related papers (2025-05-16T03:28:42Z) - Understanding World or Predicting Future? A Comprehensive Survey of World Models [21.96900555014452]
This survey offers a comprehensive review of the literature on world models.<n>World models are regarded as tools for either understanding the present state of the world or predicting its future dynamics.
arXiv Detail & Related papers (2024-11-21T03:58:50Z) - Pandora: Towards General World Model with Natural Language Actions and Video States [61.30962762314734]
Pandora is a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions.
Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning.
arXiv Detail & Related papers (2024-06-12T18:55:51Z) - Evaluating the World Model Implicit in a Generative Model [7.317896355747284]
Recent work suggests that large language models may implicitly learn world models.
This includes problems as diverse as simple logical reasoning, geographic navigation, game-playing, and chemistry.
We propose new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory.
arXiv Detail & Related papers (2024-06-06T02:20:31Z) - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond [101.15395503285804]
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI)
In this survey, we embark on a comprehensive exploration of the latest advancements in world models.
We examine challenges and limitations of world models, and discuss their potential future directions.
arXiv Detail & Related papers (2024-05-06T14:37:07Z) - WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens [75.02160668328425]
We introduce WorldDreamer, a pioneering world model to foster a comprehensive comprehension of general world physics and motions.
WorldDreamer frames world modeling as an unsupervised visual sequence modeling challenge.
Our experiments show that WorldDreamer excels in generating videos across different scenarios, including natural scenes and driving environments.
arXiv Detail & Related papers (2024-01-18T14:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.