The Trinity of Consistency as a Defining Principle for General World Models
- URL: http://arxiv.org/abs/2602.23152v1
- Date: Thu, 26 Feb 2026 16:15:55 GMT
- Title: The Trinity of Consistency as a Defining Principle for General World Models
- Authors: Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Xi bai, Chang Yu, Yumou Liu, Junnan Zhu, Xuanhe Zhou, Jintao Chen, Xiaobin Hu, Shancheng Pang, Bihui Yu, Ran He, Zhen Lei, Stan Z. Li, Conghui He, Shuicheng Yan, Cheng Tan,
- Abstract summary: General World Models are capable of learning, simulating, and reasoning about objective physical laws.<n>We propose a principled theoretical framework that defines the essential properties requisite for a General World Model.<n>Our work establishes a principled pathway toward general world models, clarifying both the limitations of current systems and the architectural requirements for future progress.
- Score: 106.16462830681452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The construction of World Models capable of learning, simulating, and reasoning about objective physical laws constitutes a foundational challenge in the pursuit of Artificial General Intelligence. Recent advancements represented by video generation models like Sora have demonstrated the potential of data-driven scaling laws to approximate physical dynamics, while the emerging Unified Multimodal Model (UMM) offers a promising architectural paradigm for integrating perception, language, and reasoning. Despite these advances, the field still lacks a principled theoretical framework that defines the essential properties requisite for a General World Model. In this paper, we propose that a World Model must be grounded in the Trinity of Consistency: Modal Consistency as the semantic interface, Spatial Consistency as the geometric basis, and Temporal Consistency as the causal engine. Through this tripartite lens, we systematically review the evolution of multimodal learning, revealing a trajectory from loosely coupled specialized modules toward unified architectures that enable the synergistic emergence of internal world simulators. To complement this conceptual framework, we introduce CoW-Bench, a benchmark centered on multi-frame reasoning and generation scenarios. CoW-Bench evaluates both video generation models and UMMs under a unified evaluation protocol. Our work establishes a principled pathway toward general world models, clarifying both the limitations of current systems and the architectural requirements for future progress.
Related papers
- UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? [50.92401586025528]
Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear.<n>We introduce UniG2U-Bench, a comprehensive benchmark categorizing generation-to-understanding (G2U) evaluation into 7 regimes and 30 subtasks.
arXiv Detail & Related papers (2026-03-03T18:36:16Z) - Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks [43.59401259468559]
We argue that a robust world model should not be a loose collection of capabilities but a normative framework that integrally incorporates interaction, perception, symbolic reasoning, and spatial representation.<n>This work aims to guide future research toward more general, robust, and principled models of the world.
arXiv Detail & Related papers (2026-02-02T04:42:44Z) - A Mechanistic View on Video Generation as World Models: State and Dynamics [43.951972667861575]
This work proposes a novel taxonomy centered on two pillars: State Construction and Dynamics Modeling.<n>By addressing these challenges, the field can evolve from generating visually plausible videos to building robust, general-purpose world simulators.
arXiv Detail & Related papers (2026-01-22T19:00:18Z) - Natural Building Blocks for Structured World Models: Theory, Evidence, and Scaling [42.78591555984395]
We propose a framework that specifies the natural building blocks for structured world models.<n>We examine Hidden Markov Models (HMMs) and linear switching dynamical systems (sLDS) as natural building blocks for discrete and continuous modeling.<n>This modular approach supports both passive modeling (generation, forecasting) and active control (planning, decision-making) within the same architecture.
arXiv Detail & Related papers (2025-11-03T22:02:04Z) - Co-Evolving Latent Action World Models [57.48921576959243]
Adapting pre-trained video models into controllable world models via latent actions is a promising step towards creating generalist world models.<n>We propose CoLA-World, which for the first time successfully realizes this synergistic paradigm.<n>This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM.
arXiv Detail & Related papers (2025-10-30T12:28:40Z) - World Models Should Prioritize the Unification of Physical and Social Dynamics [57.91940497010114]
This paper argues that the systematic, bidirectional unification of physical and social predictive capabilities is the next crucial frontier for world model development.<n>We contend that comprehensive world models must holistically integrate objective physical laws with the subjective, evolving, and context-dependent nature of social dynamics.
arXiv Detail & Related papers (2025-10-24T07:42:37Z) - Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models [93.1043186636177]
We explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental models tailored to novel situations.<n>We propose a computational implementation of this idea -- a Model Synthesis Architecture''<n>We evaluate our MSA as a model of human judgments on a novel reasoning dataset.
arXiv Detail & Related papers (2025-07-16T18:01:03Z) - Learning Local Causal World Models with State Space Models and Attention [1.5498250598583487]
We show that a SSM can model the dynamics of a simple environment and learn a causal model at the same time.<n>We pave the way for further experiments that lean into the strength of SSMs and further enhance them with causal awareness.
arXiv Detail & Related papers (2025-05-04T11:57:02Z) - Aether: Geometric-Aware Unified World Modeling [49.33579903601599]
Aether is a unified framework that enables geometry-aware reasoning in world models.<n>Our framework achieves zero-shot generalization in both action following and reconstruction tasks.<n>We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling.
arXiv Detail & Related papers (2025-03-24T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.