A Perceptual Quality Metric for Video Frame Interpolation
- URL: http://arxiv.org/abs/2210.01879v1
- Date: Tue, 4 Oct 2022 19:56:10 GMT
- Title: A Perceptual Quality Metric for Video Frame Interpolation
- Authors: Qiqi Hou, Abhijay Ghildyal, Feng Liu
- Abstract summary: As video frame results often unique artifacts, existing quality metrics sometimes are not consistent with human perception when measuring the results.
Some recent deep learning-based quality metrics are shown more consistent with human judgments, but their performance on videos is compromised since they do not consider temporal information.
Our method learns perceptual features directly from videos instead of individual frames.
- Score: 6.743340926667941
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research on video frame interpolation has made significant progress in recent
years. However, existing methods mostly use off-the-shelf metrics to measure
the quality of interpolation results with the exception of a few methods that
employ user studies, which is time-consuming. As video frame interpolation
results often exhibit unique artifacts, existing quality metrics sometimes are
not consistent with human perception when measuring the interpolation results.
Some recent deep learning-based perceptual quality metrics are shown more
consistent with human judgments, but their performance on videos is compromised
since they do not consider temporal information. In this paper, we present a
dedicated perceptual quality metric for measuring video frame interpolation
results. Our method learns perceptual features directly from videos instead of
individual frames. It compares pyramid features extracted from video frames and
employs Swin Transformer blocks-based spatio-temporal modules to extract
spatio-temporal information. To train our metric, we collected a new video
frame interpolation quality assessment dataset. Our experiments show that our
dedicated quality metric outperforms state-of-the-art methods when measuring
video frame interpolation results. Our code and model are made publicly
available at \url{https://github.com/hqqxyy/VFIPS}.
Related papers
- Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos [13.368981834953981]
We propose Fr'echet Video Motion Distance metric, which focuses on evaluating motion consistency in video generation.
Specifically, we design explicit motion features based on key point tracking, and then measure the similarity between these features via the Fr'echet distance.
We carry out a large-scale human study, demonstrating that our metric effectively detects temporal noise and aligns better with human perceptions of generated video quality than existing metrics.
arXiv Detail & Related papers (2024-07-23T02:10:50Z) - STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models [6.855409699832414]
Video generative models struggle to generate even short video clips.
Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks.
We propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects.
arXiv Detail & Related papers (2024-01-30T08:18:20Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation [4.151439675744056]
We present a bespoke full reference video quality metric for VFI, FloLPIPS, that builds on the popular perceptual image quality metric, LPIPS.
FloLPIPS shows superior correlation performance with subjective ground truth over 12 popular quality assessors.
arXiv Detail & Related papers (2022-07-17T09:07:33Z) - Revealing Single Frame Bias for Video-and-Language Learning [115.01000652123882]
We show that a single-frame trained model can achieve better performance than existing methods that use multiple frames for training.
This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets.
We propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling.
arXiv Detail & Related papers (2022-06-07T16:28:30Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.