Related papers: Learning Cross-Video Neural Representations for High-Quality Frame Interpolation

Learning Cross-Video Neural Representations for High-Quality Frame Interpolation

URL: http://arxiv.org/abs/2203.00137v1
Date: Mon, 28 Feb 2022 23:16:02 GMT
Title: Learning Cross-Video Neural Representations for High-Quality Frame Interpolation
Authors: Wentao Shangguan, Yu Sun, Weijie Gan, Ulugbek S. Kamilov
Abstract summary: Cross-Video Neural (CURE) is the first video method based on neural fields (NF) CURE represents the video as a continuous function parameterized by a coordinate-based neural network.
Score: 13.711714133169961
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper considers the problem of temporal video interpolation, where the goal is to synthesize a new video frame given its two neighbors. We propose Cross-Video Neural Representation (CURE) as the first video interpolation method based on neural fields (NF). NF refers to the recent class of methods for the neural representation of complex 3D scenes that has seen widespread success and application across computer vision. CURE represents the video as a continuous function parameterized by a coordinate-based neural network, whose inputs are the spatiotemporal coordinates and outputs are the corresponding RGB values. CURE introduces a new architecture that conditions the neural network on the input frames for imposing space-time consistency in the synthesized video. This not only improves the final interpolation quality, but also enables CURE to learn a prior across multiple videos. Experimental evaluations show that CURE achieves the state-of-the-art performance on video interpolation on several benchmark datasets.

Related papers

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures. Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z)
Semi-supervised 3D Video Information Retrieval with Deep Neural Network and Bi-directional Dynamic-time Warping Algorithm [14.39527406033429]
The proposed algorithm is designed to handle large video datasets and retrieve the most related videos to a given inquiry video clip. We split both the candidate and the inquiry videos into a sequence of clips and convert each clip to a representation vector using an autoencoder-backed deep neural network. We then calculate a similarity measure between the sequences of embedding vectors using a bi-directional dynamic time-warping method.
arXiv Detail & Related papers (2023-09-03T03:10:18Z)
Progressive Fourier Neural Representation for Sequential Video Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions. We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session. We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z)
Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z)
Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance. We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z)
Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame. IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z)
Neural 3D Video Synthesis [18.116032726623608]
We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene. Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting. We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes.
arXiv Detail & Related papers (2021-03-03T18:47:40Z)
W-Cell-Net: Multi-frame Interpolation of Cellular Microscopy Videos [1.7205106391379026]
We apply recent advances in Deep video convolution to increase the temporal resolution of fluorescent microscopy time-lapse movies. To our knowledge, there is no previous work that uses Conal Neural Networks (CNN) to generate frames between two consecutive microscopy images.
arXiv Detail & Related papers (2020-05-14T01:33:38Z)
Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation [99.64565200170897]
We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-01-14T18:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.