Related papers: A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction

A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction

URL: http://arxiv.org/abs/2502.05503v2
Date: Tue, 18 Feb 2025 09:07:09 GMT
Title: A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Authors: Yongfan Chen, Xiuwen Zhu, Tianyu Li, Hao Chen, Chunhua Shen,
Abstract summary: We introduce a benchmark designed specifically to assess the Physical Coherence of generated videos, PhyCoBench. Our benchmark includes 120 prompts covering 7 categories of physical principles, capturing key physical laws observable in video content. We propose an automated evaluation model: PhyCoPredictor, a diffusion model that generates optical flow and video frames in a cascade manner.
Score: 46.66613455969943
License:
Abstract: Recent advances in video generation models demonstrate their potential as world simulators, but they often struggle with videos deviating from physical laws, a key concern overlooked by most text-to-video benchmarks. We introduce a benchmark designed specifically to assess the Physical Coherence of generated videos, PhyCoBench. Our benchmark includes 120 prompts covering 7 categories of physical principles, capturing key physical laws observable in video content. We evaluated four state-of-the-art (SoTA) T2V models on PhyCoBench and conducted manual assessments. Additionally, we propose an automated evaluation model: PhyCoPredictor, a diffusion model that generates optical flow and video frames in a cascade manner. Through a consistency evaluation comparing automated and manual sorting, the experimental results show that PhyCoPredictor currently aligns most closely with human evaluation. Therefore, it can effectively evaluate the physical coherence of videos, providing insights for future model optimization. Our benchmark, including physical coherence prompts, the automatic evaluation tool PhyCoPredictor, and the generated video dataset, has been released on GitHub at https://github.com/Jeckinchen/PhyCoBench.

Related papers

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos [6.093379844890164]
We propose a novel method to selectively incorporate the physics models with the kinematics observations in an online setting. A recurrent neural network is introduced to realize a Kalman filter that attentively balances the kinematics input and simulated motion. The proposed approach excels in the physics-based human pose estimation task and demonstrates the physical plausibility of the predictive dynamics.
arXiv Detail & Related papers (2024-10-10T10:24:59Z)
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation [51.750634349748736]
Text-to-video (T2V) models have made significant strides in visualizing complex prompts. However, the capacity of these models to accurately represent intuitive physics remains largely unexplored. We introduce PhyGenBench to evaluate physical commonsense correctness in T2V generation.
arXiv Detail & Related papers (2024-10-07T17:56:04Z)
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective [94.2662603491163]
Existing evaluation protocols primarily focus on temporal consistency and content continuity. We propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models.
arXiv Detail & Related papers (2024-07-01T08:51:22Z)
VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities. We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models. Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z)
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z)
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models [6.855409699832414]
Video generative models struggle to generate even short video clips. Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks. We propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects.
arXiv Detail & Related papers (2024-01-30T08:18:20Z)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z)
A Brief Survey on Adaptive Video Streaming Quality Assessment [30.253712568568876]
Quality of experience (QoE) assessment for adaptive video streaming plays a significant role in advanced network management systems. We analyze and compare different variations of objective QoE assessment models with or without using machine learning techniques for adaptive video streaming. We find that existing video streaming QoE assessment models still have limited performance, which makes it difficult to be applied in practical communication systems.
arXiv Detail & Related papers (2022-02-25T21:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.