Do generative video models understand physical principles?
- URL: http://arxiv.org/abs/2501.09038v3
- Date: Thu, 27 Feb 2025 15:10:51 GMT
- Title: Do generative video models understand physical principles?
- Authors: Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos,
- Abstract summary: AI video generation is undergoing a revolution, with quality and realism advancing rapidly.<n>Do video models learn "world models" that discover laws of physics, or are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality?<n>We address this question by developing Physics-IQ, a benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles.
- Score: 15.534227431706773
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn "world models" that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.
Related papers
- Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments [55.465371691714296]
We introduce Morpheus, a benchmark for evaluating video generation models on physical reasoning.
It features 80 real-world videos capturing physical phenomena, guided by conservation laws.
Our findings reveal that even with advanced prompting and video conditioning, current models struggle to encode physical principles.
arXiv Detail & Related papers (2025-04-03T15:21:17Z) - VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior [88.51778468222766]
Video diffusion models (VDMs) have advanced significantly in recent years, enabling the generation of highly realistic videos.
VDMs often fail to produce physically plausible videos due to an inherent lack of understanding of physics.
We propose a novel two-stage image-to-video generation framework that explicitly incorporates physics with vision and language informed physical prior.
arXiv Detail & Related papers (2025-03-30T09:03:09Z) - Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning [76.94237859217469]
Physical AI systems need to perceive, understand, and perform complex actions in the physical world.
We present models that can understand the physical world generate appropriate embodied decisions.
We use a hierarchical ontology that captures fundamental knowledge about space, time, and physics.
For embodied reasoning, we rely on a two-dimensional ontology that generalizes across different physical embodiments.
arXiv Detail & Related papers (2025-03-18T22:06:58Z) - WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation [43.71082938654985]
We introduce the World Simulator Assistant (WISA), an effective framework for decomposing and incorporating physical principles into T2V models.
WISA decomposes physical principles into textual physical descriptions, qualitative physical categories, and quantitative physical properties.
We propose a novel video dataset, WISA-32K, collected based on qualitative physical categories.
arXiv Detail & Related papers (2025-03-11T08:10:03Z) - Discover physical concepts and equations with machine learning [7.565272546753481]
We propose a model that combines Variational Autoencoders (VAE) with Neural Ordinary Differential Equations (Neural ODEs)
This allows us to simultaneously discover physical concepts and governing equations from simulated experimental data.
We apply the model to several examples inspired by the history of physics, including Copernicus' heliocentrism, Newton's law of gravity, Schr"odinger's wave mechanics, and Pauli's spin-magnetic formulation.
arXiv Detail & Related papers (2024-12-11T15:30:21Z) - PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos [66.09921831504238]
We propose PhysGame as a pioneering benchmark to evaluate physical commonsense violations in gameplay videos.<n>Our findings reveal that the performance of current open-source video LLMs significantly lags behind that of proprietary counterparts.<n>Based on the suite of datasets, we propose PhysVLM as a physical knowledge-enhanced video LLM.
arXiv Detail & Related papers (2024-12-02T18:47:25Z) - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [29.831214435147583]
We present PhysGen, a novel image-to-video generation method.
It produces a realistic, physically plausible, and temporally consistent video.
Our key insight is to integrate model-based physical simulation with a data-driven video generation process.
arXiv Detail & Related papers (2024-09-27T17:59:57Z) - Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation.
It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes.
We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z) - VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities.
We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models.
Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z) - DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors [75.83647027123119]
We propose to learn the physical properties of a material field with video diffusion priors.<n>We then utilize a physics-based Material-Point-Method simulator to generate 4D content with realistic motions.
arXiv Detail & Related papers (2024-06-03T16:05:25Z) - Dynamic Visual Reasoning by Learning Differentiable Physics Models from
Video and Language [92.7638697243969]
We propose a unified framework that can jointly learn visual concepts and infer physics models of objects from videos and language.
This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.
arXiv Detail & Related papers (2021-10-28T17:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.