Enhancing Deformable Convolution based Video Frame Interpolation with
Coarse-to-fine 3D CNN
- URL: http://arxiv.org/abs/2202.07731v2
- Date: Thu, 22 Jun 2023 12:58:40 GMT
- Title: Enhancing Deformable Convolution based Video Frame Interpolation with
Coarse-to-fine 3D CNN
- Authors: Duolikun Danier, Fan Zhang and David Bull
- Abstract summary: This paper presents a new deformable convolution-based video frame (VFI) method, using a coarse to fine 3D CNN to enhance the multi-flow prediction.
The results evidently show the effectiveness of the proposed method, which offers superior performance over other state-of-the-art VFI methods.
- Score: 4.151439675744056
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a new deformable convolution-based video frame
interpolation (VFI) method, using a coarse to fine 3D CNN to enhance the
multi-flow prediction. This model first extracts spatio-temporal features at
multiple scales using a 3D CNN, and estimates multi-flows using these features
in a coarse-to-fine manner. The estimated multi-flows are then used to warp the
original input frames as well as context maps, and the warped results are fused
by a synthesis network to produce the final output. This VFI approach has been
fully evaluated against 12 state-of-the-art VFI methods on three commonly used
test databases. The results evidently show the effectiveness of the proposed
method, which offers superior interpolation performance over other state of the
art algorithms, with PSNR gains up to 0.19dB.
Related papers
- PolyDiff: Generating 3D Polygonal Meshes with Diffusion Models [15.846449180313778]
PolyDiff is the first diffusion-based approach capable of directly generating realistic and diverse 3D polygonal meshes.
Our model is capable of producing high-quality 3D polygonal meshes, ready for integration into downstream 3D.
arXiv Detail & Related papers (2023-12-18T18:19:26Z) - Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles [0.0]
This work presents an innovative approach to validate video content.
The methodology blends advanced 2-dimensional and 3-dimensional Convolutional Neural Networks.
Experimental validation underscores the effectiveness of this strategy, showcasing its potential in countering deepfakes generation.
arXiv Detail & Related papers (2023-10-25T06:00:37Z) - H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [63.23985601478339]
We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
arXiv Detail & Related papers (2022-11-21T09:49:23Z) - Positional Encoding Augmented GAN for the Assessment of Wind Flow for
Pedestrian Comfort in Urban Areas [0.41998444721319217]
This work rephrases the problem from computing 3D flow fields using CFD to a 2D image-to-image translation-based problem on the building footprints to predict the flow field at pedestrian height level.
We investigate the use of generative adversarial networks (GAN), such as Pix2Pix and CycleGAN representing state-of-the-art for image-to-image translation task in various domains.
arXiv Detail & Related papers (2021-12-15T19:37:11Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Spatio-Temporal Multi-Flow Network for Video Frame Interpolation [3.6053802212032995]
Video frame (VFI) is a very active research topic, with applications spanning computer vision, post production and video encoding.
We present a novel deep learning based VFI method, ST-MFNet, based on a Spatio-Temporal Multi-Flow architecture.
arXiv Detail & Related papers (2021-11-30T15:18:46Z) - A Novel Patch Convolutional Neural Network for View-based 3D Model
Retrieval [36.12906920608775]
We propose a novel patch convolutional neural network (PCNN) for view-based 3D model retrieval.
Our proposed PCNN can outperform state-of-the-art approaches, with mAP alues of 93.67%, and 96.23%, respectively.
arXiv Detail & Related papers (2021-09-25T07:18:23Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Cascaded Deep Video Deblurring Using Temporal Sharpness Prior [88.98348546566675]
The proposed algorithm mainly consists of optical flow estimation from intermediate latent frames and latent frame restoration steps.
It first develops a deep CNN model to estimate optical flow from intermediate latent frames and then restores the latent frames based on the estimated optical flow.
We show that exploring the domain knowledge of video deblurring is able to make the deep CNN model more compact and efficient.
arXiv Detail & Related papers (2020-04-06T09:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.