"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
- URL: http://arxiv.org/abs/2507.13428v1
- Date: Thu, 17 Jul 2025 17:54:09 GMT
- Title: "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
- Authors: Jing Gu, Xian Liu, Yu Zeng, Ashwin Nagarajan, Fangrui Zhu, Daniel Hong, Yue Fan, Qianqi Yan, Kaiwen Zhou, Ming-Yu Liu, Xin Eric Wang,
- Abstract summary: PhyWorldBench is a benchmark designed to evaluate video generation models based on their adherence to the laws of physics.<n>We introduce a novel ""Anti-Physics" category, where prompts intentionally violate real-world physics.<n>We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models.
- Score: 38.14213802594432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video generation models have achieved remarkable progress in creating high-quality, photorealistic content. However, their ability to accurately simulate physical phenomena remains a critical and unresolved challenge. This paper presents PhyWorldBench, a comprehensive benchmark designed to evaluate video generation models based on their adherence to the laws of physics. The benchmark covers multiple levels of physical phenomena, ranging from fundamental principles like object motion and energy conservation to more complex scenarios involving rigid body interactions and human or animal motion. Additionally, we introduce a novel ""Anti-Physics"" category, where prompts intentionally violate real-world physics, enabling the assessment of whether models can follow such instructions while maintaining logical consistency. Besides large-scale human evaluation, we also design a simple yet effective method that could utilize current MLLM to evaluate the physics realism in a zero-shot fashion. We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models, with a detailed comparison and analysis. we identify pivotal challenges models face in adhering to real-world physics. Through systematic testing of their outputs across 1,050 curated prompts-spanning fundamental, composite, and anti-physics scenarios-we identify pivotal challenges these models face in adhering to real-world physics. We then rigorously examine their performance on diverse physical phenomena with varying prompt types, deriving targeted recommendations for crafting prompts that enhance fidelity to physical principles.
Related papers
- WorldBench: Disambiguating Physics for Diagnostic Evaluation of World Models [17.757245394765807]
We introduce WorldBench, a video-based benchmark specifically designed for concept-specific, disentangled evaluation.<n>WorldBench offers a more nuanced and scalable framework for rigorously evaluating the physical reasoning capabilities of video generation and world models.
arXiv Detail & Related papers (2026-01-29T05:31:02Z) - PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models [100.65199317765608]
Physical principles are fundamental to realistic visual simulation, but remain a significant oversight in transformer-based video generation.<n>We introduce a physics-aware reinforcement learning paradigm for video generation models that enforces physical collision rules directly in high-dimensional spaces.<n>We extend this paradigm to a unified framework, termed Mimicry-Discovery Cycle (MDcycle), which allows substantial fine-tuning.
arXiv Detail & Related papers (2026-01-16T08:40:10Z) - ProPhy: Progressive Physical Alignment for Dynamic World Simulation [55.456455952212416]
ProPhy is a Progressive Physical Alignment Framework that enables explicit physics-aware conditioning and anisotropic generation.<n>We show that ProPhy produces more realistic, dynamic, and physically coherent results than existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-05T09:39:26Z) - PhysChoreo: Physics-Controllable Video Generation with Part-Aware Semantic Grounding [50.454084539837005]
PhysChoreo is a novel framework that can generate videos with diverse controllability and physical realism from a single image.<n>Our method consists of two stages: first, it estimates the static initial physical properties of all objects in the image through part-aware physical property reconstruction.<n>Then, through temporally instructed and physically editable simulation, it synthesizes high-quality videos with rich dynamic behaviors and physical realism.
arXiv Detail & Related papers (2025-11-25T17:59:04Z) - PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis [52.905353023326306]
We propose PhysWorld, a framework that synthesizes physically plausible and diverse demonstrations to learn efficient world models.<n>Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.
arXiv Detail & Related papers (2025-10-24T13:25:39Z) - LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference [57.086932851733145]
We introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models.<n>We benchmark intuitive physics understanding in current video diffusion models.<n> Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.
arXiv Detail & Related papers (2025-10-13T15:19:07Z) - SlotPi: Physics-informed Object-centric Reasoning Models [37.32107835829927]
We introduce SlotPi, a physics-informed object-centric reasoning model.<n>Our experiments highlight the model's strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets.<n>We have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model's capabilities.
arXiv Detail & Related papers (2025-06-12T14:53:36Z) - IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments [26.02187269408895]
IntPhys 2 is a video benchmark designed to evaluate the intuitive physics understanding of deep learning models.<n>IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity.
arXiv Detail & Related papers (2025-06-11T15:21:16Z) - PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis [62.283499219361595]
PhysGaia is a physics-aware dataset specifically designed for Dynamic Novel View Synthesis (DyNVS)<n>Our dataset provides complex dynamic scenarios with rich interactions among multiple objects.<n>PhysGaia will significantly advance research in dynamic view synthesis, physics-based scene understanding, and deep learning models integrated with physical simulation.
arXiv Detail & Related papers (2025-06-03T12:19:18Z) - T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation [12.120541052871486]
generative models produce high-quality videos that excel in aesthetic appeal and accurate instruction following.<n>Many outputs still violate basic constraints such as rigid-body collisions, energy conservation, and gravitational dynamics.<n>Existing physical-evaluation benchmarks rely on automatic, pixel-level metrics applied to simplistic, life-scenario prompts.<n>We introduce textbfT2VPhysBench, a first-principled benchmark that systematically evaluates whether state-of-the-art text-to-video systems obey twelve core physical laws.
arXiv Detail & Related papers (2025-05-01T06:34:55Z) - PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models [33.45006997591683]
PHYBench is a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty.<n>PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items.<n> Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models.
arXiv Detail & Related papers (2025-04-22T17:53:29Z) - Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments [55.465371691714296]
We introduce Morpheus, a benchmark for evaluating video generation models on physical reasoning.<n>It features 80 real-world videos capturing physical phenomena, guided by conservation laws.<n>Our findings reveal that even with advanced prompting and video conditioning, current models struggle to encode physical principles.
arXiv Detail & Related papers (2025-04-03T15:21:17Z) - PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos [21.441062722848265]
PhysTwin is a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive replica.<n>Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, and generative shape models for geometry, and Gaussian splats for rendering.<n>Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints.
arXiv Detail & Related papers (2025-03-23T07:49:19Z) - Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC [14.522189177415724]
Recent advancements in AI-generated content have significantly improved the realism of 3D and 4D generation.<n>Most existing methods prioritize appearance consistency while neglecting underlying physical principles.<n>This survey provides a review of physics-aware generative methods, systematically analyzing how physical constraints are integrated into 3D and 4D generation.
arXiv Detail & Related papers (2025-02-10T20:13:16Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities.
We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models.
Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.