Transformation-based Adversarial Video Prediction on Large-Scale Data
- URL: http://arxiv.org/abs/2003.04035v3
- Date: Wed, 17 Nov 2021 17:56:08 GMT
- Title: Transformation-based Adversarial Video Prediction on Large-Scale Data
- Authors: Pauline Luc, Aidan Clark, Sander Dieleman, Diego de Las Casas, Yotam
Doron, Albin Cassirer, Karen Simonyan
- Abstract summary: We focus on the task of video prediction, where given a sequence of frames extracted from a video, the goal is to generate a plausible future sequence.
We first improve the state of the art by performing a systematic empirical study of discriminator decompositions.
We then propose a novel recurrent unit which transforms its past hidden state according to predicted motion-like features.
- Score: 19.281817081571408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent breakthroughs in adversarial generative modeling have led to models
capable of producing video samples of high quality, even on large and complex
datasets of real-world video. In this work, we focus on the task of video
prediction, where given a sequence of frames extracted from a video, the goal
is to generate a plausible future sequence. We first improve the state of the
art by performing a systematic empirical study of discriminator decompositions
and proposing an architecture that yields faster convergence and higher
performance than previous approaches. We then analyze recurrent units in the
generator, and propose a novel recurrent unit which transforms its past hidden
state according to predicted motion-like features, and refines it to handle
dis-occlusions, scene changes and other complex behavior. We show that this
recurrent unit consistently outperforms previous designs. Our final model leads
to a leap in the state-of-the-art performance, obtaining a test set Frechet
Video Distance of 25.7, down from 69.2, on the large-scale Kinetics-600
dataset.
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Video Probabilistic Diffusion Models in Projected Latent Space [75.4253202574722]
We propose a novel generative model for videos, coined projected latent video diffusion models (PVDM)
PVDM learns a video distribution in a low-dimensional latent space and thus can be efficiently trained with high-resolution videos under limited resources.
arXiv Detail & Related papers (2023-02-15T14:22:34Z) - HARP: Autoregressive Latent Video Prediction with High-Fidelity Image
Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames.
We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z) - Diffusion Probabilistic Modeling for Video Generation [17.48026395867434]
Denoising diffusion probabilistic models are a promising new class of generative models that are competitive with GANs on perceptual metrics.
Inspired by recent advances in neural video compression, we use denoising diffusion models to generate a residual baseline to a deterministic next-frame prediction.
We find significant improvements in terms of perceptual quality on all data and improvements in terms of frame forecasting for complex high-resolution videos.
arXiv Detail & Related papers (2022-03-16T03:52:45Z) - Insights from Generative Modeling for Neural Video Compression [31.59496634465347]
We present newly proposed neural video coding algorithms through the lens of deep autoregressive and latent variable modeling.
We propose several architectures that yield state-of-the-art video compression performance on high-resolution video.
We provide further evidence that the generative modeling viewpoint can advance the neural video coding field.
arXiv Detail & Related papers (2021-07-28T02:19:39Z) - FitVid: Overfitting in Pixel-Level Video Prediction [117.59339756506142]
We introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks.
FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.
arXiv Detail & Related papers (2021-06-24T17:20:21Z) - Greedy Hierarchical Variational Autoencoders for Large-Scale Video
Prediction [79.23730812282093]
We introduce Greedy Hierarchical Variational Autoencoders (GHVAEs), a method that learns high-fidelity video predictions by greedily training each level of a hierarchical autoencoder.
GHVAEs provide 17-55% gains in prediction performance on four video datasets, a 35-40% higher success rate on real robot tasks, and can improve performance monotonically by simply adding more modules.
arXiv Detail & Related papers (2021-03-06T18:58:56Z) - Predicting Video with VQVAE [8.698137120086063]
We use Vector Quantized Variational AutoEncoders (VQ-VAE) to compress high-resolution videos into a hierarchical set of discrete latent variables.
Compared to pixels, this compressed latent space has dramatically reduced dimensionality, allowing us to apply scalable autoregressive generative models to predict video.
We predict video at a higher resolution on unconstrained videos, 256x256, than any other previous method to our knowledge.
arXiv Detail & Related papers (2021-03-02T18:59:10Z) - Future Video Synthesis with Object Motion Prediction [54.31508711871764]
Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics.
The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects.
Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
arXiv Detail & Related papers (2020-04-01T16:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.