Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos
- URL: http://arxiv.org/abs/2601.16471v1
- Date: Fri, 23 Jan 2026 06:02:07 GMT
- Title: Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos
- Authors: Meng Cao, Haoran Tang, Haoze Zhao, Mingfei Han, Ruyang Liu, Qiang Sun, Xiaojun Chang, Ian Reid, Xiaodan Liang,
- Abstract summary: We propose a novel paradigm that leverages glitches in gameplay videos, referring to visual anomalies that violate predefined physical laws, as a rich and scalable supervision source for physical world understanding.<n>We introduce PhysGame, a dataset containing 140,057 glitch-centric question-answer pairs across five physical domains and sixteen fine-grained categories.<n>Experiments show that PhysGame significantly enhances both Game2Real transferability, improving the real world physical reasoning performance of Qwen2.5VL by 2.5%, and Game2General transferability, yielding a 1.9% gain on the MVBench benchmark.
- Score: 82.4003989236851
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Understanding the physical world, including object dynamics, material properties, and causal interactions, remains a core challenge in artificial intelligence. Although recent multi-modal large language models (MLLMs) have demonstrated impressive general reasoning capabilities, they still fall short of achieving human-level understanding of physical principles. Existing datasets for physical reasoning either rely on real-world videos, which incur high annotation costs, or on synthetic simulations, which suffer from limited realism and diversity. In this paper, we propose a novel paradigm that leverages glitches in gameplay videos, referring to visual anomalies that violate predefined physical laws, as a rich and scalable supervision source for physical world understanding. We introduce PhysGame, an meta information guided instruction-tuning dataset containing 140,057 glitch-centric question-answer pairs across five physical domains and sixteen fine-grained categories. To ensure data accuracy, we design a prompting strategy that utilizes gameplay metadata such as titles and descriptions to guide high-quality QA generation. Complementing PhysGame, we construct GameBench, an expert-annotated benchmark with 880 glitch-identified gameplay videos designed to evaluate physical reasoning capabilities. Extensive experiments show that PhysGame significantly enhances both Game2Real transferability, improving the real world physical reasoning performance of Qwen2.5VL by 2.5% on PhysBench, and Game2General transferability, yielding a 1.9% gain on the MVBench benchmark. Moreover, PhysGame-tuned models achieve a 3.7% absolute improvement on GameBench, demonstrating enhanced robustness in detecting physical implausibilities. These results indicate that learning from gameplay anomalies offers a scalable and effective pathway toward advancing physical world understanding in multimodal intelligence.
Related papers
- PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding [50.454084539837005]
PhysChoreo is a novel framework that can generate videos with diverse controllability and physical realism from a single image.<n>Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction.<n>Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism.
arXiv Detail & Related papers (2025-11-25T17:59:04Z) - PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis [52.905353023326306]
We propose PhysWorld, a framework that synthesizes physically plausible and diverse demonstrations to learn efficient world models.<n>Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.
arXiv Detail & Related papers (2025-10-24T13:25:39Z) - LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference [57.086932851733145]
We introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models.<n>We benchmark intuitive physics understanding in current video diffusion models.<n> Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.
arXiv Detail & Related papers (2025-10-13T15:19:07Z) - IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments [26.02187269408895]
IntPhys 2 is a video benchmark designed to evaluate the intuitive physics understanding of deep learning models.<n>IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity.
arXiv Detail & Related papers (2025-06-11T15:21:16Z) - VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation [66.58048825989239]
VideoPhy-2 is an action-centric dataset for evaluating physical commonsense in generated videos.<n>We perform human evaluation that assesses semantic adherence, physical commonsense, and grounding of physical rules in the generated videos.<n>Our findings reveal major shortcomings, with even the best model achieving only 22% joint performance.
arXiv Detail & Related papers (2025-03-09T22:49:12Z) - Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection [2.1013864820763755]
Humans detect real-world object anomalies by perceiving, interacting, and reasoning based on object-conditioned physical knowledge.<n>Phys-AD is the first large-scale, real-world, physics-grounded video dataset for industrial anomaly detection.<n>The dataset includes more than 6400 videos across 22 real-world object categories, interacting with robot arms and motors, and exhibits 47 types of anomalies.
arXiv Detail & Related papers (2025-03-05T14:49:08Z) - PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos [66.09921831504238]
We propose PhysGame as a pioneering benchmark to evaluate physical commonsense violations in gameplay videos.<n>Our findings reveal that the performance of current open-source video LLMs significantly lags behind that of proprietary counterparts.<n>Based on the suite of datasets, we propose PhysVLM as a physical knowledge-enhanced video LLM.
arXiv Detail & Related papers (2024-12-02T18:47:25Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.