Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2202.10571v1
- Date: Mon, 21 Feb 2022 23:24:01 GMT
- Title: Generating Videos with Dynamics-aware Implicit Generative Adversarial
Networks
- Authors: Sihyun Yu, Jihoon Tack, Sangwoo Mo, Hyunsu Kim, Junho Kim, Jung-Woo
Ha, Jinwoo Shin
- Abstract summary: We propose dynamics-aware implicit generative adversarial network (DIGAN) for video generation.
We show that DIGAN can be trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48 frames of the previous state-of-the-art method.
- Score: 68.93429034530077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the deep learning era, long video generation of high-quality still remains
challenging due to the spatio-temporal complexity and continuity of videos.
Existing prior works have attempted to model video distribution by representing
videos as 3D grids of RGB values, which impedes the scale of generated videos
and neglects continuous dynamics. In this paper, we found that the recent
emerging paradigm of implicit neural representations (INRs) that encodes a
continuous signal into a parameterized neural network effectively mitigates the
issue. By utilizing INRs of video, we propose dynamics-aware implicit
generative adversarial network (DIGAN), a novel generative adversarial network
for video generation. Specifically, we introduce (a) an INR-based video
generator that improves the motion dynamics by manipulating the space and time
coordinates differently and (b) a motion discriminator that efficiently
identifies the unnatural motions without observing the entire long frame
sequences. We demonstrate the superiority of DIGAN under various datasets,
along with multiple intriguing properties, e.g., long video synthesis, video
extrapolation, and non-autoregressive video generation. For example, DIGAN
improves the previous state-of-the-art FVD score on UCF-101 by 30.7% and can be
trained on 128 frame videos of 128x128 resolution, 80 frames longer than the 48
frames of the previous state-of-the-art method.
Related papers
- Unfolding Videos Dynamics via Taylor Expansion [5.723852805622308]
We present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi)
ViDiDi observes different aspects of a video through various orders of temporal derivatives of its frame sequence.
ViDiDi learns a single neural network that encodes a video and its temporal derivatives into consistent embeddings.
arXiv Detail & Related papers (2024-09-04T01:41:09Z) - ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models [66.84478240757038]
A majority of video diffusion models (VDMs) generate long videos in an autoregressive manner, i.e., generating subsequent clips conditioned on last frames of previous clip.
We introduce causal (i.e., unidirectional) generation into VDMs, and use past frames as prompt to generate future frames.
Our ViD-GPT achieves state-of-the-art performance both quantitatively and qualitatively on long video generation.
arXiv Detail & Related papers (2024-06-16T15:37:22Z) - StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [58.49820807662246]
We introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.
Our code will be available at: https://github.com/Picsart-AI-Research/StreamingT2V.
arXiv Detail & Related papers (2024-03-21T18:27:29Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Autoencoding Video Latents for Adversarial Video Generation [0.0]
AVLAE is a two stream latent autoencoder where the video distribution is learned by adversarial training.
We demonstrate that our approach learns to disentangle motion and appearance codes even without the explicit structural composition in the generator.
arXiv Detail & Related papers (2022-01-18T11:42:14Z) - Vid-ODE: Continuous-Time Video Generation with Neural Ordinary
Differential Equation [42.85126020237214]
We propose continuous-time video generation by combining neural ODE (Vid-ODE) with pixel-level video processing techniques.
Vid-ODE is the first work successfully performing continuous-time video generation using real-world videos.
arXiv Detail & Related papers (2020-10-16T06:50:47Z) - Recurrent Deconvolutional Generative Adversarial Networks with
Application to Text Guided Video Generation [11.15855312510806]
We propose a recurrent deconvolutional generative adversarial network (RD-GAN), which includes a 3D convolutional neural network (3D-CNN) as the discriminator.
The proposed model can be jointly trained by pushing the RDN to generate realistic videos so that the 3D-CNN cannot distinguish them from real ones.
arXiv Detail & Related papers (2020-08-13T12:22:27Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.