H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions
- URL: http://arxiv.org/abs/2211.11309v1
- Date: Mon, 21 Nov 2022 09:49:23 GMT
- Title: H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions
- Authors: Changlin Li, Guangyang Wu, Yanan Sun, Xin Tao, Chi-Keung Tang, Yu-Wing
Tai
- Abstract summary: We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
- Score: 63.23985601478339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capitalizing on the rapid development of neural networks, recent video frame
interpolation (VFI) methods have achieved notable improvements. However, they
still fall short for real-world videos containing large motions. Complex
deformation and/or occlusion caused by large motions make it an extremely
difficult problem in video frame interpolation. In this paper, we propose a
simple yet effective solution, H-VFI, to deal with large motions in video frame
interpolation. H-VFI contributes a hierarchical video interpolation transformer
(HVIT) to learn a deformable kernel in a coarse-to-fine strategy in multiple
scales. The learnt deformable kernel is then utilized in convolving the input
frames for predicting the interpolated frame. Starting from the smallest scale,
H-VFI updates the deformable kernel by a residual in succession based on former
predicted kernels, intermediate interpolated results and hierarchical features
from transformer. Bias and masks to refine the final outputs are then predicted
by a transformer block based on interpolated results. The advantage of such a
progressive approximation is that the large motion frame interpolation problem
can be decomposed into several relatively simpler sub-tasks, which enables a
very accurate prediction in the final results. Another noteworthy contribution
of our paper consists of a large-scale high-quality dataset, YouTube200K, which
contains videos depicting a great variety of scenarios captured at high
resolution and high frame rate. Extensive experiments on multiple frame
interpolation benchmarks validate that H-VFI outperforms existing
state-of-the-art methods especially for videos with large motions.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - Video Frame Interpolation with Flow Transformer [31.371987879960287]
Video frame has been actively studied with the development of convolutional neural networks.
We propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism.
Our framework is suitable for interpolating frames with large motion while maintaining reasonably low complexity.
arXiv Detail & Related papers (2023-07-30T06:44:37Z) - E-VFIA : Event-Based Video Frame Interpolation with Attention [8.93294761619288]
We propose an event-based video frame with attention (E-VFIA) as a lightweight kernel-based method.
E-VFIA fuses event information with standard video frames by deformable convolutions to generate high quality interpolated frames.
The proposed method represents events with high temporal resolution and uses a multi-head self-attention mechanism to better encode event-based information.
arXiv Detail & Related papers (2022-09-19T21:40:32Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Exploring Motion Ambiguity and Alignment for High-Quality Video Frame
Interpolation [46.02120172459727]
We propose to relax the requirement of reconstructing an intermediate frame as close to the ground-truth (GT) as possible.
We develop a texture consistency loss (TCL) upon the assumption that the interpolated content should maintain similar structures with their counterparts in the given frames.
arXiv Detail & Related papers (2022-03-19T10:37:06Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.