Related papers: "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models

URL: http://arxiv.org/abs/2507.13428v1
Date: Thu, 17 Jul 2025 17:54:09 GMT
Title: "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Authors: Jing Gu, Xian Liu, Yu Zeng, Ashwin Nagarajan, Fangrui Zhu, Daniel Hong, Yue Fan, Qianqi Yan, Kaiwen Zhou, Ming-Yu Liu, Xin Eric Wang,
Abstract summary: PhyWorldBench is a benchmark designed to evaluate video generation models based on their adherence to the laws of physics.<n>We introduce a novel ""Anti-Physics" category, where prompts intentionally violate real-world physics.<n>We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models.
Score: 38.14213802594432
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video generation models have achieved remarkable progress in creating high-quality, photorealistic content. However, their ability to accurately simulate physical phenomena remains a critical and unresolved challenge. This paper presents PhyWorldBench, a comprehensive benchmark designed to evaluate video generation models based on their adherence to the laws of physics. The benchmark covers multiple levels of physical phenomena, ranging from fundamental principles like object motion and energy conservation to more complex scenarios involving rigid body interactions and human or animal motion. Additionally, we introduce a novel ""Anti-Physics"" category, where prompts intentionally violate real-world physics, enabling the assessment of whether models can follow such instructions while maintaining logical consistency. Besides large-scale human evaluation, we also design a simple yet effective method that could utilize current MLLM to evaluate the physics realism in a zero-shot fashion. We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models, with a detailed comparison and analysis. we identify pivotal challenges models face in adhering to real-world physics. Through systematic testing of their outputs across 1,050 curated prompts-spanning fundamental, composite, and anti-physics scenarios-we identify pivotal challenges these models face in adhering to real-world physics. We then rigorously examine their performance on diverse physical phenomena with varying prompt types, deriving targeted recommendations for crafting prompts that enhance fidelity to physical principles.

Related papers

SlotPi: Physics-informed Object-centric Reasoning Models [37.32107835829927]
We introduce SlotPi, a physics-informed object-centric reasoning model.<n>Our experiments highlight the model's strengths in tasks such as prediction and Visual Question Answering (VQA) on benchmark and fluid datasets.<n>We have created a real-world dataset encompassing object interactions, fluid dynamics, and fluid-object interactions, on which we validated our model's capabilities.
arXiv Detail & Related papers (2025-06-12T14:53:36Z)
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments [26.02187269408895]
IntPhys 2 is a video benchmark designed to evaluate the intuitive physics understanding of deep learning models.<n>IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity.
arXiv Detail & Related papers (2025-06-11T15:21:16Z)
PhysGaia: A Physics-Aware Dataset of Multi-Body Interactions for Dynamic Novel View Synthesis [62.283499219361595]
PhysGaia is a physics-aware dataset specifically designed for Dynamic Novel View Synthesis (DyNVS)<n>Our dataset provides complex dynamic scenarios with rich interactions among multiple objects.<n>PhysGaia will significantly advance research in dynamic view synthesis, physics-based scene understanding, and deep learning models integrated with physical simulation.
arXiv Detail & Related papers (2025-06-03T12:19:18Z)
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation [12.120541052871486]
generative models produce high-quality videos that excel in aesthetic appeal and accurate instruction following.<n>Many outputs still violate basic constraints such as rigid-body collisions, energy conservation, and gravitational dynamics.<n>Existing physical-evaluation benchmarks rely on automatic, pixel-level metrics applied to simplistic, life-scenario prompts.<n>We introduce textbfT2VPhysBench, a first-principled benchmark that systematically evaluates whether state-of-the-art text-to-video systems obey twelve core physical laws.
arXiv Detail & Related papers (2025-05-01T06:34:55Z)
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models [33.45006997591683]
PHYBench is a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty.<n>PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items.<n> Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models.
arXiv Detail & Related papers (2025-04-22T17:53:29Z)
Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments [55.465371691714296]
We introduce Morpheus, a benchmark for evaluating video generation models on physical reasoning.<n>It features 80 real-world videos capturing physical phenomena, guided by conservation laws.<n>Our findings reveal that even with advanced prompting and video conditioning, current models struggle to encode physical principles.
arXiv Detail & Related papers (2025-04-03T15:21:17Z)
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos [21.441062722848265]
PhysTwin is a novel framework that uses sparse videos of dynamic objects under interaction to produce a photo- and physically realistic, real-time interactive replica.<n>Our approach centers on two key components: (1) a physics-informed representation that combines spring-mass models for realistic physical simulation, and generative shape models for geometry, and Gaussian splats for rendering.<n>Our method integrates an inverse physics framework with visual perception cues, enabling high-fidelity reconstruction even from partial, occluded, and limited viewpoints.
arXiv Detail & Related papers (2025-03-23T07:49:19Z)
Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC [14.522189177415724]
Recent advancements in AI-generated content have significantly improved the realism of 3D and 4D generation.<n>Most existing methods prioritize appearance consistency while neglecting underlying physical principles.<n>This survey provides a review of physics-aware generative methods, systematically analyzing how physical constraints are integrated into 3D and 4D generation.
arXiv Detail & Related papers (2025-02-10T20:13:16Z)
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts. However, the capacity of these models to accurately represent intuitive physics remains largely unexplored. We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z)
VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities. We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models. Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.