Spatio-Temporal Multi-Flow Network for Video Frame Interpolation
- URL: http://arxiv.org/abs/2111.15483v1
- Date: Tue, 30 Nov 2021 15:18:46 GMT
- Title: Spatio-Temporal Multi-Flow Network for Video Frame Interpolation
- Authors: Duolikun Danier, Fan Zhang, David Bull
- Abstract summary: Video frame (VFI) is a very active research topic, with applications spanning computer vision, post production and video encoding.
We present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture.
- Score: 3.6053802212032995
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Video frame interpolation (VFI) is currently a very active research topic,
with applications spanning computer vision, post production and video encoding.
VFI can be extremely challenging, particularly in sequences containing large
motions, occlusions or dynamic textures, where existing approaches fail to
offer perceptually robust interpolation performance. In this context, we
present a novel deep learning based VFI method, ST-MFNet, based on a
Spatio-Temporal Multi-Flow architecture. ST-MFNet employs a new multi-scale
multi-flow predictor to estimate many-to-one intermediate flows, which are
combined with conventional one-to-one optical flows to capture both large and
complex motions. In order to enhance interpolation performance for various
textures, a 3D CNN is also employed to model the content dynamics over an
extended temporal window. Moreover, ST-MFNet has been trained within an ST-GAN
framework, which was originally developed for texture synthesis, with the aim
of further improving perceptual interpolation quality. Our approach has been
comprehensively evaluated -- compared with fourteen state-of-the-art VFI
algorithms -- clearly demonstrating that ST-MFNet consistently outperforms
these benchmarks on varied and representative test datasets, with significant
gains up to 1.09dB in PSNR for cases including large motions and dynamic
textures. Project page: https://danielism97.github.io/ST-MFNet.
Related papers
- Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
Motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity.
We propose a novel diffusion framework, motion-aware latent diffusion models (MADiff)
Our method achieves state-of-the-art performance significantly outperforming existing approaches.
arXiv Detail & Related papers (2024-04-21T05:09:56Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - A Multi-In-Single-Out Network for Video Frame Interpolation without
Optical Flow [14.877766449009119]
deep learning-based video frame (VFI) methods have predominantly focused on estimating motion between two input frames.
We propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation.
We introduce a novel motion perceptual loss that enables MISO-VFI to better capture the vectors-temporal within the video frames.
arXiv Detail & Related papers (2023-11-20T08:29:55Z) - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [80.33846577924363]
We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video framegithub.
It is based on two essential designs. First, we build bidirectional volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations.
Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately.
arXiv Detail & Related papers (2023-04-19T16:18:47Z) - Video Frame Interpolation with Transformer [55.12620857638253]
We introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames.
Our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other.
arXiv Detail & Related papers (2022-05-15T09:30:28Z) - Enhancing Deformable Convolution based Video Frame Interpolation with
Coarse-to-fine 3D CNN [4.151439675744056]
This paper presents a new deformable convolution-based video frame (VFI) method, using a coarse to fine 3D CNN to enhance the multi-flow prediction.
The results evidently show the effectiveness of the proposed method, which offers superior performance over other state-of-the-art VFI methods.
arXiv Detail & Related papers (2022-02-15T21:20:18Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.