Exploring the Evolution of Physics Cognition in Video Generation: A Survey
- URL: http://arxiv.org/abs/2503.21765v1
- Date: Thu, 27 Mar 2025 17:58:33 GMT
- Title: Exploring the Evolution of Physics Cognition in Video Generation: A Survey
- Authors: Minghui Lin, Xiang Wang, Yishan Wang, Shu Wang, Fengqi Dai, Pengxiang Ding, Cunxiang Wang, Zhengrong Zuo, Nong Sang, Siteng Huang, Donglin Wang,
- Abstract summary: This survey aims to provide a comprehensive summary of architecture designs and their applications to fill this gap.<n>We discuss and organize the evolutionary process of physical cognition in video generation from a cognitive science perspective.<n>We propose a three-tier taxonomy: 1) basic perception for generation, 2) passive cognition of physical knowledge for generation, and 3) active cognition for world simulation.
- Score: 44.305405114910904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in video generation have witnessed significant progress, especially with the rapid advancement of diffusion models. Despite this, their deficiencies in physical cognition have gradually received widespread attention - generated content often violates the fundamental laws of physics, falling into the dilemma of ''visual realism but physical absurdity". Researchers began to increasingly recognize the importance of physical fidelity in video generation and attempted to integrate heuristic physical cognition such as motion representations and physical knowledge into generative systems to simulate real-world dynamic scenarios. Considering the lack of a systematic overview in this field, this survey aims to provide a comprehensive summary of architecture designs and their applications to fill this gap. Specifically, we discuss and organize the evolutionary process of physical cognition in video generation from a cognitive science perspective, while proposing a three-tier taxonomy: 1) basic schema perception for generation, 2) passive cognition of physical knowledge for generation, and 3) active cognition for world simulation, encompassing state-of-the-art methods, classical paradigms, and benchmarks. Subsequently, we emphasize the inherent key challenges in this domain and delineate potential pathways for future research, contributing to advancing the frontiers of discussion in both academia and industry. Through structured review and interdisciplinary analysis, this survey aims to provide directional guidance for developing interpretable, controllable, and physically consistent video generation paradigms, thereby propelling generative models from the stage of ''visual mimicry'' towards a new phase of ''human-like physical comprehension''.
Related papers
- Grounding Creativity in Physics: A Brief Survey of Physical Priors in AIGC [14.522189177415724]
Recent advancements in AI-generated content have significantly improved the realism of 3D and 4D generation.<n>Most existing methods prioritize appearance consistency while neglecting underlying physical principles.<n>This survey provides a review of physics-aware generative methods, systematically analyzing how physical constraints are integrated into 3D and 4D generation.
arXiv Detail & Related papers (2025-02-10T20:13:16Z) - Generative Physical AI in Vision: A Survey [25.867330158975932]
Generative Artificial Intelligence (AI) has rapidly advanced the field of computer vision by enabling machines to create and interpret visual data with unprecedented sophistication.<n>As generative AI evolves to increasingly integrate physical realism and dynamic simulation, its potential to function as a "world simulator"<n>This survey systematically reviews this emerging field of physics-aware generative AI in computer vision.
arXiv Detail & Related papers (2025-01-19T03:19:47Z) - Integrating Physics and Topology in Neural Networks for Learning Rigid Body Dynamics [6.675805308519987]
We introduce a novel framework for modeling rigid body dynamics and learning collision interactions.
We propose a physics-informed message-passing neural architecture, embedding physical laws directly in the model.
This work addresses the challenge of multi-entity dynamic interactions, with applications spanning diverse scientific and engineering domains.
arXiv Detail & Related papers (2024-11-18T11:03:15Z) - Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts.
However, the capacity of these models to accurately represent intuitive physics remains largely unexplored.
We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z) - Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation [30.245348014602577]
We discuss the evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora.
Our review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation.
We conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community.
arXiv Detail & Related papers (2024-03-08T07:58:13Z) - Visual cognition in multimodal large language models [12.603212933816206]
Recent advancements have rekindled interest in the potential to emulate human-like cognitive abilities.
This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology.
arXiv Detail & Related papers (2023-11-27T18:58:34Z) - Human Motion Generation: A Survey [67.38982546213371]
Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications.
Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts.
We present a comprehensive literature review of human motion generation, which is the first of its kind in this field.
arXiv Detail & Related papers (2023-07-20T14:15:20Z) - Intrinsic Physical Concepts Discovery with Object-Centric Predictive
Models [86.25460882547581]
We introduce the PHYsical Concepts Inference NEtwork (PHYCINE), a system that infers physical concepts in different abstract levels without supervision.
We show that object representations containing the discovered physical concepts variables could help achieve better performance in causal reasoning tasks.
arXiv Detail & Related papers (2023-03-03T11:52:21Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - A Survey on Machine Learning Approaches for Modelling Intuitive Physics [1.3190581566723918]
intuitive physics is a cognitive ability that is commonly known as intuitive physics.
Many of the contemporary approaches in modelling intuitive physics for machine cognition have been inspired by literature from cognitive science.
This paper presents a comprehensive survey of recent advances and techniques in intuitive physics-inspired deep learning approaches for physical reasoning.
arXiv Detail & Related papers (2022-02-14T04:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.