On the Content Bias in Fréchet Video Distance
- URL: http://arxiv.org/abs/2404.12391v1
- Date: Thu, 18 Apr 2024 17:59:58 GMT
- Title: On the Content Bias in Fréchet Video Distance
- Authors: Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang,
- Abstract summary: Fr'echet Video Distance (FVD) is a prominent metric for evaluating video generation models.
In this paper, we aim to explore the extent of FVD's bias toward per-frame quality over temporal realism.
We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality.
- Score: 42.717821654744796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fr\'echet Video Distance (FVD), a prominent metric for evaluating video generation models, is known to conflict with human perception occasionally. In this paper, we aim to explore the extent of FVD's bias toward per-frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the FVD increases only slightly with large temporal corruption. We then analyze the generated videos and show that via careful sampling from a large set of generated videos that do not contain motions, one can drastically decrease FVD without improving the temporal quality. Both studies suggest FVD's bias towards the quality of individual frames. We further observe that the bias can be attributed to the features extracted from a supervised video classifier trained on the content-biased dataset. We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality. Finally, we revisit a few real-world examples to validate our hypothesis.
Related papers
- Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality [8.068194154084967]
The Fr't Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality.
Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; and (3) the impractical sample sizes required for reliable estimation.
arXiv Detail & Related papers (2024-10-07T17:07:21Z) - STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models [6.855409699832414]
Video generative models struggle to generate even short video clips.
Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks.
We propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects.
arXiv Detail & Related papers (2024-01-30T08:18:20Z) - Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal
Intervention [72.12974259966592]
We present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips.
We propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets.
arXiv Detail & Related papers (2023-09-17T15:58:27Z) - Exploring the Effectiveness of Video Perceptual Representation in Blind
Video Quality Assessment [55.65173181828863]
We propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation.
Experiments show that TPQI is an effective way of predicting subjective temporal quality.
arXiv Detail & Related papers (2022-07-08T07:30:51Z) - DisCoVQA: Temporal Distortion-Content Transformers for Video Quality
Assessment [56.42140467085586]
Some temporal variations are causing temporal distortions and lead to extra quality degradations.
Human visual system often has different attention to frames with different contents.
We propose a novel and effective transformer-based VQA method to tackle these two issues.
arXiv Detail & Related papers (2022-06-20T15:31:27Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.