Enhanced Quadratic Video Interpolation
- URL: http://arxiv.org/abs/2009.04642v1
- Date: Thu, 10 Sep 2020 02:31:50 GMT
- Title: Enhanced Quadratic Video Interpolation
- Authors: Yihao Liu and Liangbin Xie and Li Siyao and Wenxiu Sun and Yu Qiao and
Chao Dong
- Abstract summary: We propose an enhanced quadratic video (EQVI) model to handle more complicated scenes and motion patterns.
To further boost the performance, we devise a novel multi-scale fusion network (MS-Fusion) which can be regarded as a learnable augmentation process.
The proposed EQVI model won the first place in the AIM 2020 Video Temporal Super-Resolution Challenge.
- Score: 56.54662568085176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the prosperity of digital video industry, video frame interpolation has
arisen continuous attention in computer vision community and become a new
upsurge in industry. Many learning-based methods have been proposed and
achieved progressive results. Among them, a recent algorithm named quadratic
video interpolation (QVI) achieves appealing performance. It exploits
higher-order motion information (e.g. acceleration) and successfully models the
estimation of interpolated flow. However, its produced intermediate frames
still contain some unsatisfactory ghosting, artifacts and inaccurate motion,
especially when large and complex motion occurs. In this work, we further
improve the performance of QVI from three facets and propose an enhanced
quadratic video interpolation (EQVI) model. In particular, we adopt a rectified
quadratic flow prediction (RQFP) formulation with least squares method to
estimate the motion more accurately. Complementary with image pixel-level
blending, we introduce a residual contextual synthesis network (RCSN) to employ
contextual information in high-dimensional feature space, which could help the
model handle more complicated scenes and motion patterns. Moreover, to further
boost the performance, we devise a novel multi-scale fusion network (MS-Fusion)
which can be regarded as a learnable augmentation process. The proposed EQVI
model won the first place in the AIM2020 Video Temporal Super-Resolution
Challenge.
Related papers
- MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion [3.7270979204213446]
We present four key contributions to address the challenges of video processing.
First, we introduce the 3D Inverted Vector-Quantization Variencoenco Autocoder.
Second, we present MotionAura, a text-to-video generation framework.
Third, we propose a spectral transformer-based denoising network.
Fourth, we introduce a downstream task of Sketch Guided Videopainting.
arXiv Detail & Related papers (2024-10-10T07:07:56Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Disentangled Motion Modeling for Video Frame Interpolation [40.83962594702387]
Video frame (VFI) aims to synthesize intermediate frames in between existing frames to enhance visual smoothness and quality.
We introduce disentangled Motion Modeling (MoMo), a diffusion-based approach for VFI that enhances visual quality by focusing on intermediate motion modeling.
arXiv Detail & Related papers (2024-06-25T03:50:20Z) - Decouple Content and Motion for Conditional Image-to-Video Generation [6.634105805557556]
conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.
Previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity.
We propose a novel approach by disentangling the target RGB pixels into two distinct components: spatial content and temporal motions.
arXiv Detail & Related papers (2023-11-24T06:08:27Z) - H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [63.23985601478339]
We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
arXiv Detail & Related papers (2022-11-21T09:49:23Z) - STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution
Video Prediction [78.129039340528]
We propose a StemporalResidual Predictive Model (STRPM) for high-resolution video prediction.
STRPM can generate more satisfactory results compared with various existing methods.
Experimental results show that STRPM can generate more satisfactory results compared with various existing methods.
arXiv Detail & Related papers (2022-03-30T06:24:00Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Non-linear Motion Estimation for Video Frame Interpolation using
Space-time Convolutions [18.47978862083129]
Video frame aims to synthesize one or multiple frames between two consecutive frames in a video.
Some older works tackled this problem by assuming per-pixel linear motion between video frames.
We propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used.
arXiv Detail & Related papers (2022-01-27T09:49:23Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.