PDWN: Pyramid Deformable Warping Network for Video Interpolation
- URL: http://arxiv.org/abs/2104.01517v1
- Date: Sun, 4 Apr 2021 02:08:57 GMT
- Title: PDWN: Pyramid Deformable Warping Network for Video Interpolation
- Authors: Zhiqi Chen, Ran Wang, Haojie Liu and Yao Wang
- Abstract summary: We propose a light but effective model, called Pyramid Deformable Warping Network (PDWN)
PDWN uses a pyramid structure to generate DConv offsets of the unknown middle frame with respect to the known frames through coarse-to-fine successive refinements.
Our method achieves better or on-par accuracy compared to state-of-the-art models on multiple datasets.
- Score: 11.62213584807003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video interpolation aims to generate a non-existent intermediate frame given
the past and future frames. Many state-of-the-art methods achieve promising
results by estimating the optical flow between the known frames and then
generating the backward flows between the middle frame and the known frames.
However, these methods usually suffer from the inaccuracy of estimated optical
flows and require additional models or information to compensate for flow
estimation errors. Following the recent development in using deformable
convolution (DConv) for video interpolation, we propose a light but effective
model, called Pyramid Deformable Warping Network (PDWN). PDWN uses a pyramid
structure to generate DConv offsets of the unknown middle frame with respect to
the known frames through coarse-to-fine successive refinements. Cost volumes
between warped features are calculated at every pyramid level to help the
offset inference. At the finest scale, the two warped frames are adaptively
blended to generate the middle frame. Lastly, a context enhancement network
further enhances the contextual detail of the final output. Ablation studies
demonstrate the effectiveness of the coarse-to-fine offset refinement, cost
volumes, and DConv. Our method achieves better or on-par accuracy compared to
state-of-the-art models on multiple datasets while the number of model
parameters and the inference time are substantially less than previous models.
Moreover, we present an extension of the proposed framework to use four input
frames, which can achieve significant improvement over using only two input
frames, with only a slight increase in the model size and inference time.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation [55.676358801492114]
We propose OCAI, a method that supports robust frame ambiguities by generating intermediate video frames alongside optical flows in between.
Our evaluations demonstrate superior quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
arXiv Detail & Related papers (2024-03-26T20:23:48Z) - IDO-VFI: Identifying Dynamics via Optical Flow Guidance for Video Frame
Interpolation with Events [14.098949778274733]
Event cameras are ideal for capturing inter-frame dynamics with their extremely high temporal resolution.
We propose an event-and-frame-based video frame method named IDO-VFI that assigns varying amounts of computation for different sub-regions.
Our proposed method maintains high-quality performance while reducing computation time and computational effort by 10% and 17% respectively on Vimeo90K datasets.
arXiv Detail & Related papers (2023-05-17T13:22:21Z) - Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion.
This approach is not optimal when the temporal distance between the input sequence increases.
We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z) - Enhanced Correlation Matching based Video Frame Interpolation [5.304928339627251]
We propose a novel framework called the Enhanced Correlation Matching based Video Frame Interpolation Network.
The proposed scheme employs the recurrent pyramid architecture that shares the parameters among each pyramid layer for optical flow estimation.
Experiment results demonstrate that the proposed scheme outperforms the previous works at 4K video data and low-resolution benchmark datasets as well as in terms of objective and subjective quality.
arXiv Detail & Related papers (2021-11-17T02:43:45Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network [5.225160072036824]
LiDAR point cloud frame, which synthesizes the intermediate frame between the captured frames, has emerged as an important issue for many applications.
We propose a novel LiDAR point cloud optical frame method, which exploits range images (RIs) as an intermediate representation with CNNs to conduct the frame process.
Our method consistently achieves superior frame results with better perceptual quality to that of using state-of-the-art video frame methods.
arXiv Detail & Related papers (2021-06-01T13:59:08Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.