FILM: Frame Interpolation for Large Motion
- URL: http://arxiv.org/abs/2202.04901v2
- Date: Sat, 12 Feb 2022 02:45:42 GMT
- Title: FILM: Frame Interpolation for Large Motion
- Authors: Fitsum Reda, Janne Kontkanen, Eric Tabellion, Deqing Sun, Caroline
Pantofaru, Brian Curless
- Abstract summary: We present a frame algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion.
Our approach outperforms state-of-the-art methods on the Xiph large motion benchmark.
- Score: 20.04001872133824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a frame interpolation algorithm that synthesizes multiple
intermediate frames from two input images with large in-between motion. Recent
methods use multiple networks to estimate optical flow or depth and a separate
network dedicated to frame synthesis. This is often complex and requires scarce
optical flow or depth ground-truth. In this work, we present a single unified
network, distinguished by a multi-scale feature extractor that shares weights
at all scales, and is trainable from frames alone. To synthesize crisp and
pleasing frames, we propose to optimize our network with the Gram matrix loss
that measures the correlation difference between feature maps. Our approach
outperforms state-of-the-art methods on the Xiph large motion benchmark. We
also achieve higher scores on Vimeo-90K, Middlebury and UCF101, when comparing
to methods that use perceptual losses. We study the effect of weight sharing
and of training with datasets of increasing motion range. Finally, we
demonstrate our model's effectiveness in synthesizing high quality and
temporally coherent videos on a challenging near-duplicate photos dataset.
Codes and pre-trained models are available at
https://github.com/google-research/frame-interpolation.
Related papers
- FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline [4.295130967329365]
This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model.
The design of our model significantly reduces computational costs compared to other masked frame approaches.
We evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores.
arXiv Detail & Related papers (2023-11-22T00:26:15Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Progressive Motion Context Refine Network for Efficient Video Frame
Interpolation [10.369068266836154]
Flow-based frame methods have achieved great success by first modeling optical flow between target and input frames, and then building synthesis network for target frame generation.
We propose a novel Progressive Motion Context Refine Network (PMCRNet) to predict motion fields and image context jointly for higher efficiency.
Experiments on multiple benchmarks show that proposed approaches not only achieve favorable and quantitative results but also reduces model size and running time significantly.
arXiv Detail & Related papers (2022-11-11T06:29:03Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z) - W-Cell-Net: Multi-frame Interpolation of Cellular Microscopy Videos [1.7205106391379026]
We apply recent advances in Deep video convolution to increase the temporal resolution of fluorescent microscopy time-lapse movies.
To our knowledge, there is no previous work that uses Conal Neural Networks (CNN) to generate frames between two consecutive microscopy images.
arXiv Detail & Related papers (2020-05-14T01:33:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.