Related papers: Simple Video Generation using Neural ODEs

Simple Video Generation using Neural ODEs

URL: http://arxiv.org/abs/2109.03292v1
Date: Tue, 7 Sep 2021 19:03:33 GMT
Title: Simple Video Generation using Neural ODEs
Authors: David Kanaa and Vikram Voleti and Samira Ebrahimi Kahou and Christopher Pal
Abstract summary: We learn latent variable models that predict the future in latent space and project back to pixels. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.
Score: 9.303957136142293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely challenging. It is a common belief that a key step towards solving this task resides in modelling accurately both spatial and temporal information in video signals. A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time. The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.

Related papers

Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past. We leverage the large-scale pretraining of image diffusion models which can handle multi-modality. We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z)
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction [20.701792842768747]
We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain. Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
arXiv Detail & Related papers (2023-12-11T16:12:43Z)
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling [28.530765643908083]
We decouple spatial-temporal modeling and integrate an image- and a video-language to learn fine-grained visual understanding. We propose a novel pre-training objective, Temporal Referring Modeling, which requires the model to identify temporal positions of events in video sequences. Our model outperforms previous work pre-trained on orders of magnitude larger datasets.
arXiv Detail & Related papers (2022-10-08T07:03:31Z)
Enhancing Spatiotemporal Prediction Model using Modular Design and Beyond [2.323220706791067]
It is challenging to predict sequence varies both in time and space. The mainstream method is to model and spatial temporal structures at the same time. A modular design is proposed, which embeds sequence model into two modules: a spatial encoder-decoder and a predictor.
arXiv Detail & Related papers (2022-10-04T10:09:35Z)
Modelling Latent Dynamics of StyleGAN using Neural ODEs [52.03496093312985]
We learn the trajectory of independently inverted latent codes from GANs. The learned continuous trajectory allows us to perform infinite frame and consistent video manipulation. Our method achieves state-of-the-art performance but with much less computation.
arXiv Detail & Related papers (2022-08-23T21:20:38Z)
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [70.31913835035206]
We present a novel approach to the video synthesis problem that helps to greatly improve visual quality. We make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for. Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes.
arXiv Detail & Related papers (2021-07-15T09:58:15Z)
Efficient training for future video generation based on hierarchical disentangled representation of latent variables [66.94698064734372]
We propose a novel method for generating future prediction videos with less memory usage than the conventional methods. We achieve high-efficiency by training our method in two stages: (1) image reconstruction to encode video frames into latent variables, and (2) latent variable prediction to generate the future sequence. Our experiments show that the proposed method can efficiently generate future prediction videos, even for complex datasets that cannot be handled by previous methods.
arXiv Detail & Related papers (2021-06-07T10:43:23Z)
Learning Temporal Dynamics from Cycles in Narrated Video [85.89096034281694]
We propose a self-supervised solution to the problem of learning to model how the world changes as time elapses. Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed. We apply the learned dynamics model without further training to various tasks, such as predicting future action and temporally ordering sets of images.
arXiv Detail & Related papers (2021-01-07T02:41:32Z)
Stochastic Latent Residual Video Prediction [0.0]
This paper introduces a novel temporal model whose dynamics are governed in a latent space by a residual update rule. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
arXiv Detail & Related papers (2020-02-21T10:44:01Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.