Related papers: Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution

URL: http://arxiv.org/abs/2006.08070v2
Date: Mon, 25 Jan 2021 09:10:57 GMT
Title: Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution
Authors: Xianhang Cheng and Zhenzhong Chen
Abstract summary: Kernel-based methods predict pixels with a single convolution process that convolves source frames with spatially adaptive local kernels. We propose enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases. We show that our method performs favorably against the state-of-the-art methods across a broad range of datasets.
Score: 67.83074893311218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating non-existing frames from a consecutive video sequence has been an interesting and challenging problem in the video processing field. Typical kernel-based interpolation methods predict pixels with a single convolution process that convolves source frames with spatially adaptive local kernels, which circumvents the time-consuming, explicit motion estimation in the form of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods are prone to yield less plausible results. In addition, they cannot directly generate a frame at an arbitrary temporal position because the learned kernels are tied to the midpoint in time between the input frames. In this paper, we try to solve these problems and propose a novel non-flow kernel-based approach that we refer to as enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases to make the network obtain information from non-local neighborhood. During the learning process, different intermediate time step can be involved as a control variable by means of an extension of coord-conv trick, allowing the estimated components to vary with different input temporal information. This makes our method capable to produce multiple in-between frames. Furthermore, we investigate the relationships between our method and other typical kernel- and flow-based methods. Experimental results show that our method performs favorably against the state-of-the-art methods across a broad range of datasets. Code will be publicly available on URL: \url{https://github.com/Xianhang/EDSC-pytorch}.

Related papers

Meta-Interpolation: Time-Arbitrary Frame Interpolation via Dual Meta-Learning [65.85319901760478]
We consider processing different time-steps with adaptively generated convolutional kernels in a unified way with the help of meta-learning. We develop a dual meta-learned frame framework to synthesize intermediate frames with the guidance of context information and optical flow.
arXiv Detail & Related papers (2022-07-27T17:36:23Z)
Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis. NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel. coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z)
Video Frame Interpolation Based on Deformable Kernel Region [18.55904569126297]
We propose a deformable convolution for video, which can break the fixed grid restrictions on the kernel region. Experiments are conducted on four datasets to demonstrate the superior performance of the proposed model.
arXiv Detail & Related papers (2022-04-25T02:03:04Z)
Long-term Video Frame Interpolation via Feature Propagation [95.18170372022703]
Video frame (VFI) works generally predict intermediate frame(s) by first estimating the motion between inputs and then warping the inputs to the target time with the estimated motion. This approach is not optimal when the temporal distance between the input sequence increases. We propose a propagation network (PNet) by extending the classic feature-level forecasting with a novel motion-to-feature approach.
arXiv Detail & Related papers (2022-03-29T10:47:06Z)
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised. We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)
Video Frame Interpolation via Generalized Deformable Convolution [18.357839820102683]
Video frame aims at synthesizing intermediate frames from nearby source frames while maintaining spatial and temporal consistencies. Existing deeplearning-based video frame methods can be divided into two categories: flow-based methods and kernel-based methods. A novel mechanism named generalized deformable convolution is proposed, which can effectively learn motion in a data-driven manner and freely select sampling points in space-time.
arXiv Detail & Related papers (2020-08-24T20:00:39Z)
Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.