INR-V: A Continuous Representation Space for Video-based Generative
Tasks
- URL: http://arxiv.org/abs/2210.16579v2
- Date: Mon, 3 Apr 2023 02:58:58 GMT
- Title: INR-V: A Continuous Representation Space for Video-based Generative
Tasks
- Authors: Bipasha Sen, Aditya Agarwal, Vinay P Namboodiri, C. V. Jawahar
- Abstract summary: We propose INR-V, a video representation network that learns a continuous space for video-based generative tasks.
The representation space learned by INR-V is more expressive than an image space showcasing many interesting properties not possible with the existing works.
- Score: 43.245717657048296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating videos is a complex task that is accomplished by generating a set
of temporally coherent images frame-by-frame. This limits the expressivity of
videos to only image-based operations on the individual video frames needing
network designs to obtain temporally coherent trajectories in the underlying
image space. We propose INR-V, a video representation network that learns a
continuous space for video-based generative tasks. INR-V parameterizes videos
using implicit neural representations (INRs), a multi-layered perceptron that
predicts an RGB value for each input pixel location of the video. The INR is
predicted using a meta-network which is a hypernetwork trained on neural
representations of multiple video instances. Later, the meta-network can be
sampled to generate diverse novel videos enabling many downstream video-based
generative tasks. Interestingly, we find that conditional regularization and
progressive weight initialization play a crucial role in obtaining INR-V. The
representation space learned by INR-V is more expressive than an image space
showcasing many interesting properties not possible with the existing works.
For instance, INR-V can smoothly interpolate intermediate videos between known
video instances (such as intermediate identities, expressions, and poses in
face videos). It can also in-paint missing portions in videos to recover
temporally coherent full videos. In this work, we evaluate the space learned by
INR-V on diverse generative tasks such as video interpolation, novel video
generation, video inversion, and video inpainting against the existing
baselines. INR-V significantly outperforms the baselines on several of these
demonstrated tasks, clearly showcasing the potential of the proposed
representation space.
Related papers
- Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics [38.52385865743416]
Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes.
These encoded representations lack semantic meaning, so they cannot be used for any downstream tasks that require such properties, such as retrieval.
We propose a flexible framework that decouples the spatial and temporal aspects of the video INR.
arXiv Detail & Related papers (2024-08-05T17:59:51Z) - NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks [68.93429034530077]
We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
arXiv Detail & Related papers (2022-02-21T23:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.