STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction
- URL: http://arxiv.org/abs/2312.06486v1
- Date: Mon, 11 Dec 2023 16:12:43 GMT
- Title: STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction
- Authors: Xi Ye, Guillaume-Alexandre Bilodeau
- Abstract summary: We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain.
Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
- Score: 20.701792842768747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting future frames of a video is challenging because it is difficult to
learn the uncertainty of the underlying factors influencing their contents. In
this paper, we propose a novel video prediction model, which has
infinite-dimensional latent variables over the spatio-temporal domain.
Specifically, we first decompose the video motion and content information, then
take a neural stochastic differential equation to predict the temporal motion
information, and finally, an image diffusion model autoregressively generates
the video frame by conditioning on the predicted motion feature and the
previous frame. The better expressiveness and stronger stochasticity learning
capability of our model lead to state-of-the-art video prediction performances.
As well, our model is able to achieve temporal continuous prediction, i.e.,
predicting in an unsupervised way the future video frames with an arbitrarily
high frame rate. Our code is available at
\url{https://github.com/XiYe20/STDiffProject}.
Related papers
- State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend [3.910356300831074]
We propose a state-space decomposition video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and motion prediction.
We infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames.
arXiv Detail & Related papers (2024-04-17T17:19:48Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - HARP: Autoregressive Latent Video Prediction with High-Fidelity Image
Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames.
We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - SLAMP: Stochastic Latent Appearance and Motion Prediction [14.257878210585014]
Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components.
Most of the previous work utilizing motion is deterministic but there are methods that can model the inherent uncertainty of the future.
In this paper, we reason about appearance and motion in the videoally by predicting the future based on the motion history.
arXiv Detail & Related papers (2021-08-05T17:52:18Z) - Revisiting Hierarchical Approach for Persistent Long-Term Video
Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.
Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation.
We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - VAE^2: Preventing Posterior Collapse of Variational Video Predictions in
the Wild [131.58069944312248]
We propose a novel VAE structure, dabbed VAE-in-VAE or VAE$2$.
We treat part of the observed video sequence as a random transition state that bridges its past and future, and maximize the likelihood of a Markov Chain over the video sequence under all possible transition states.
VAE$2$ can mitigate the posterior collapse problem to a large extent, as it breaks the direct dependence between future and observation and does not directly regress the determinate future provided by the training data.
arXiv Detail & Related papers (2021-01-28T15:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.