Exploring Motion Ambiguity and Alignment for High-Quality Video Frame
Interpolation
- URL: http://arxiv.org/abs/2203.10291v1
- Date: Sat, 19 Mar 2022 10:37:06 GMT
- Title: Exploring Motion Ambiguity and Alignment for High-Quality Video Frame
Interpolation
- Authors: Kun Zhou, Wenbo Li, Xiaoguang Han, Jiangbo Lu
- Abstract summary: We propose to relax the requirement of reconstructing an intermediate frame as close to the ground-truth (GT) as possible.
We develop a texture consistency loss (TCL) upon the assumption that the interpolated content should maintain similar structures with their counterparts in the given frames.
- Score: 46.02120172459727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For video frame interpolation (VFI), existing deep-learning-based approaches
strongly rely on the ground-truth (GT) intermediate frames, which sometimes
ignore the non-unique nature of motion judging from the given adjacent frames.
As a result, these methods tend to produce averaged solutions that are not
clear enough. To alleviate this issue, we propose to relax the requirement of
reconstructing an intermediate frame as close to the GT as possible. Towards
this end, we develop a texture consistency loss (TCL) upon the assumption that
the interpolated content should maintain similar structures with their
counterparts in the given frames. Predictions satisfying this constraint are
encouraged, though they may differ from the pre-defined GT. Without the bells
and whistles, our plug-and-play TCL is capable of improving the performance of
existing VFI frameworks. On the other hand, previous methods usually adopt the
cost volume or correlation map to achieve more accurate image/feature warping.
However, the O(N^2) ({N refers to the pixel count}) computational complexity
makes it infeasible for high-resolution cases. In this work, we design a
simple, efficient (O(N)) yet powerful cross-scale pyramid alignment (CSPA)
module, where multi-scale information is highly exploited. Extensive
experiments justify the efficiency and effectiveness of the proposed strategy.
Related papers
- Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [63.23985601478339]
We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
arXiv Detail & Related papers (2022-11-21T09:49:23Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - IFRNet: Intermediate Feature Refine Network for Efficient Frame
Interpolation [44.04110765492441]
We devise an efficient encoder-decoder based network, termed IFRNet, for fast intermediate frame synthesizing.
Experiments on various benchmarks demonstrate the excellent performance and fast inference speed of proposed approaches.
arXiv Detail & Related papers (2022-05-29T10:18:18Z) - Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion.
This approach is not optimal when the temporal distance between the input sequence increases.
We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z) - Temporal Feature Alignment and Mutual Information Maximization for
Video-Based Human Pose Estimation [38.571715193347366]
We present a novel hierarchical alignment framework for multi-frame human pose estimation.
We rank No.1 in the Multi-frame Person Pose Estimation Challenge on benchmark dataset PoseTrack 2017, and obtain state-of-the-art performance on benchmarks Sub-JHMDB and Pose-Track 2018.
arXiv Detail & Related papers (2022-03-29T04:29:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.