An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence
- URL: http://arxiv.org/abs/2510.01361v1
- Date: Wed, 01 Oct 2025 18:40:38 GMT
- Title: An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence
- Authors: Conall Daly, Darren Ramsook, Anil Kokaram,
- Abstract summary: Video frame is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the impact of artefacts effectively.<n>We present $textPSNR_textDIV$, a novel full-reference quality metric that enhances PSNR through motion divergence weighting.<n>Our approach highlights singularities in motion fields which is then used to weight image errors.
- Score: 0.3823356975862005
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video frame interpolation is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the perceptual impact of interpolation artefacts effectively. Metrics like PSNR, SSIM and LPIPS ignore temporal coherence. State-of-the-art quality metrics tailored towards video frame interpolation, like FloLPIPS, have been developed but suffer from computational inefficiency that limits their practical application. We present $\text{PSNR}_{\text{DIV}}$, a novel full-reference quality metric that enhances PSNR through motion divergence weighting, a technique adapted from archival film restoration where it was developed to detect temporal inconsistencies. Our approach highlights singularities in motion fields which is then used to weight image errors. Evaluation on the BVI-VFI dataset (180 sequences across multiple frame rates, resolutions and interpolation methods) shows $\text{PSNR}_{\text{DIV}}$ achieves statistically significant improvements: +0.09 Pearson Linear Correlation Coefficient over FloLPIPS, while being 2.5$\times$ faster and using 4$\times$ less memory. Performance remains consistent across all content categories and are robust to the motion estimator used. The efficiency and accuracy of $\text{PSNR}_{\text{DIV}}$ enables fast quality evaluation and practical use as a loss function for training neural networks for video frame interpolation tasks. An implementation of our metric is available at www.github.com/conalld/psnr-div.
Related papers
- Efficient motion-based metrics for video frame interpolation [0.3823356975862005]
We propose a motion metric based on measuring the divergence of motion fields.<n>We then use our new proposed metrics to evaluate a range of state of the art frame metrics.
arXiv Detail & Related papers (2025-08-12T16:57:22Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation [4.151439675744056]
We present a bespoke full reference video quality metric for VFI, FloLPIPS, that builds on the popular perceptual image quality metric, LPIPS.
FloLPIPS shows superior correlation performance with subjective ground truth over 12 popular quality assessors.
arXiv Detail & Related papers (2022-07-17T09:07:33Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - A Backbone Replaceable Fine-tuning Framework for Stable Face Alignment [21.696696531924374]
We propose a Jitter loss function that leverages temporal information to suppress inaccurate as well as jittered landmarks.
The proposed framework achieves at least 40% improvement on stability evaluation metrics.
It can swiftly convert a landmark detector for facial images to a better-performing one for videos without retraining the entire model.
arXiv Detail & Related papers (2020-10-19T13:40:39Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.