Non-linear Motion Estimation for Video Frame Interpolation using
Space-time Convolutions
- URL: http://arxiv.org/abs/2201.11407v1
- Date: Thu, 27 Jan 2022 09:49:23 GMT
- Title: Non-linear Motion Estimation for Video Frame Interpolation using
Space-time Convolutions
- Authors: Saikat Dutta, Arulkumar Subramaniam, Anurag Mittal
- Abstract summary: Video frame aims to synthesize one or multiple frames between two consecutive frames in a video.
Some older works tackled this problem by assuming per-pixel linear motion between video frames.
We propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used.
- Score: 18.47978862083129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video frame interpolation aims to synthesize one or multiple frames between
two consecutive frames in a video. It has a wide range of applications
including slow-motion video generation, frame-rate up-scaling and developing
video codecs. Some older works tackled this problem by assuming per-pixel
linear motion between video frames. However, objects often follow a non-linear
motion pattern in the real domain and some recent methods attempt to model
per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can
also be inaccurate, especially in the case of motion discontinuities over time
(i.e. sudden jerks) and occlusions, where some of the flow information may be
invalid or inaccurate.
In our paper, we propose to approximate the per-pixel motion using a
space-time convolution network that is able to adaptively select the motion
model to be used. Specifically, we are able to softly switch between a linear
and a quadratic model. Towards this end, we use an end-to-end 3D CNN
encoder-decoder architecture over bidirectional optical flows and occlusion
maps to estimate the non-linear motion model of each pixel. Further, a motion
refinement module is employed to refine the non-linear motion and the
interpolated frames are estimated by a simple warping of the neighboring frames
with the estimated per-pixel motion. Through a set of comprehensive
experiments, we validate the effectiveness of our model and show that our
method outperforms state-of-the-art algorithms on four datasets (Vimeo, DAVIS,
HD and GoPro).
Related papers
- VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MoVideo: Motion-Aware Video Generation with Diffusion Models [97.03352319694795]
We propose a novel motion-aware generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow.
MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
arXiv Detail & Related papers (2023-11-19T13:36:03Z) - Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred
Objects in Videos [115.71874459429381]
We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video.
Experiments on benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
arXiv Detail & Related papers (2021-11-29T11:25:14Z) - Recurrent Video Deblurring with Blur-Invariant Motion Estimation and
Pixel Volumes [14.384467317051831]
We propose two novel approaches to deblurring videos by effectively aggregating information from multiple video frames.
First, we present blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames.
Second, for motion compensation, instead of aligning frames by warping with estimated motions, we use a pixel volume that contains candidate sharp pixels to resolve motion estimation errors.
arXiv Detail & Related papers (2021-08-23T07:36:49Z) - Affine-modeled video extraction from a single motion blurred image [3.0080996413230667]
A motion-blurred image is the temporal average of multiple sharp frames over the exposure time.
In this work, we report a generalized video extraction method using the affine motion modeling.
Experiments on both public datasets and real captured data validate the state-of-the-art performance of the reported technique.
arXiv Detail & Related papers (2021-04-08T13:59:14Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - Enhanced Quadratic Video Interpolation [56.54662568085176]
We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-10T02:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.