Exploring the Evolution of Physics Cognition in Video Generation: A Survey
- URL: http://arxiv.org/abs/2503.21765v1
- Date: Thu, 27 Mar 2025 17:58:33 GMT
- Title: Exploring the Evolution of Physics Cognition in Video Generation: A Survey
- Authors: Minghui Lin, Xiang Wang, Yishan Wang, Shu Wang, Fengqi Dai, Pengxiang Ding, Cunxiang Wang, Zhengrong Zuo, Nong Sang, Siteng Huang, Donglin Wang,
- Abstract summary: This survey aims to provide a comprehensive summary of architecture designs and their applications to fill this gap.<n>We discuss and organize the evolutionary process of physical cognition in video generation from a cognitive science perspective.<n>We propose a three-tier taxonomy: 1) basic perception for generation, 2) passive cognition of physical knowledge for generation, and 3) active cognition for world simulation.
- Score: 44.305405114910904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in video generation have witnessed significant progress, especially with the rapid advancement of diffusion models. Despite this, their deficiencies in physical cognition have gradually received widespread attention - generated content often violates the fundamental laws of physics, falling into the dilemma of ''visual realism but physical absurdity". Researchers began to increasingly recognize the importance of physical fidelity in video generation and attempted to integrate heuristic physical cognition such as motion representations and physical knowledge into generative systems to simulate real-world dynamic scenarios. Considering the lack of a systematic overview in this field, this survey aims to provide a comprehensive summary of architecture designs and their applications to fill this gap. Specifically, we discuss and organize the evolutionary process of physical cognition in video generation from a cognitive science perspective, while proposing a three-tier taxonomy: 1) basic schema perception for generation, 2) passive cognition of physical knowledge for generation, and 3) active cognition for world simulation, encompassing state-of-the-art methods, classical paradigms, and benchmarks. Subsequently, we emphasize the inherent key challenges in this domain and delineate potential pathways for future research, contributing to advancing the frontiers of discussion in both academia and industry. Through structured review and interdisciplinary analysis, this survey aims to provide directional guidance for developing interpretable, controllable, and physically consistent video generation paradigms, thereby propelling generative models from the stage of ''visual mimicry'' towards a new phase of ''human-like physical comprehension''.
Related papers
- "PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models [38.14213802594432]
PhyWorldBench is a benchmark designed to evaluate video generation models based on their adherence to the laws of physics.<n>We introduce a novel ""Anti-Physics" category, where prompts intentionally violate real-world physics.<n>We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models.
arXiv Detail & Related papers (2025-07-17T17:54:09Z) - Motion Generation: A Survey of Generative Approaches and Benchmarks [1.4254358932994455]
We provide an in-depth categorization of motion generation methods based on their underlying generative strategies.<n>Our main focus is on papers published in top-tier venues since 2023, reflecting the most recent advancements in the field.<n>We analyze architectural principles, conditioning mechanisms, and generation settings, and compile a detailed overview of the evaluation metrics and datasets used across the literature.
arXiv Detail & Related papers (2025-07-07T19:04:56Z) - Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field [85.12359852781216]
This survey presents a systematic analysis of over 200 papers focused on dynamic scene representation using radiance field.<n>We organize diverse methodological approaches under a unified representational framework, concluding with a critical examination of persistent challenges and promising research directions.
arXiv Detail & Related papers (2025-05-15T07:51:08Z) - Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC [14.522189177415724]
Recent advancements in AI-generated content have significantly improved the realism of 3D and 4D generation.<n>Most existing methods prioritize appearance consistency while neglecting underlying physical principles.<n>This survey provides a review of physics-aware generative methods, systematically analyzing how physical constraints are integrated into 3D and 4D generation.
arXiv Detail & Related papers (2025-02-10T20:13:16Z) - Generative Physical AI in Vision: A Survey [25.867330158975932]
Generative Artificial Intelligence (AI) has rapidly advanced the field of computer vision by enabling machines to create and interpret visual data with unprecedented sophistication.<n>As generative AI evolves to increasingly integrate physical realism and dynamic simulation, its potential to function as a "world simulator"<n>This survey systematically reviews this emerging field of physics-aware generative AI in computer vision.
arXiv Detail & Related papers (2025-01-19T03:19:47Z) - Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics [6.675805308519987]
We introduce a novel framework for modeling rigid body dynamics and learning collision interactions.
We propose a physics-informed message-passing neural architecture, embedding physical laws directly in the model.
This work addresses the challenge of multi-entity dynamic interactions, with applications spanning diverse scientific and engineering domains.
arXiv Detail & Related papers (2024-11-18T11:03:15Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation [30.245348014602577]
We discuss the evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora.
Our review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation.
We conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community.
arXiv Detail & Related papers (2024-03-08T07:58:13Z) - Visual cognition in multimodal large language models [12.603212933816206]
Recent advancements have rekindled interest in the potential to emulate human-like cognitive abilities.
This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology.
arXiv Detail & Related papers (2023-11-27T18:58:34Z) - Human Motion Generation: A Survey [67.38982546213371]
Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications.
Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts.
We present a comprehensive literature review of human motion generation, which is the first of its kind in this field.
arXiv Detail & Related papers (2023-07-20T14:15:20Z) - Intrinsic Physical Concepts Discovery with Object-Centric Predictive
Models [86.25460882547581]
We introduce the PHYsical Concepts Inference NEtwork (PHYCINE), a system that infers physical concepts in different abstract levels without supervision.
We show that object representations containing the discovered physical concepts variables could help achieve better performance in causal reasoning tasks.
arXiv Detail & Related papers (2023-03-03T11:52:21Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - A Survey on Machine Learning Approaches for Modelling Intuitive Physics [1.3190581566723918]
intuitive physics is a cognitive ability that is commonly known as intuitive physics.
Many of the contemporary approaches in modelling intuitive physics for machine cognition have been inspired by literature from cognitive science.
This paper presents a comprehensive survey of recent advances and techniques in intuitive physics-inspired deep learning approaches for physical reasoning.
arXiv Detail & Related papers (2022-02-14T04:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.