State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
- URL: http://arxiv.org/abs/2404.11576v1
- Date: Wed, 17 Apr 2024 17:19:48 GMT
- Title: State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend
- Authors: Fei Cui, Jiaojiao Fang, Xiaojiang Wu, Zelong Lai, Mengke Yang, Menghan Jia, Guizhong Liu,
- Abstract summary: We propose a state-space decomposition video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and motion prediction.
We infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames.
- Score: 3.910356300831074
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Stochastic video prediction enables the consideration of uncertainty in future motion, thereby providing a better reflection of the dynamic nature of the environment. Stochastic video prediction methods based on image auto-regressive recurrent models need to feed their predictions back into the latent space. Conversely, the state-space models, which decouple frame synthesis and temporal prediction, proves to be more efficient. However, inferring long-term temporal information about motion and generalizing to dynamic scenarios under non-stationary assumptions remains an unresolved challenge. In this paper, we propose a state-space decomposition stochastic video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and stochastic motion prediction. Through adaptive decomposition, the model's generalization capability to dynamic scenarios is enhanced. In the context of motion prediction, obtaining a prior on the long-term trend of future motion is crucial. Thus, in the stochastic motion prediction branch, we infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames. Experimental results demonstrate that our model outperforms baselines on multiple datasets.
Related papers
- GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction [20.701792842768747]
We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain.
Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
arXiv Detail & Related papers (2023-12-11T16:12:43Z) - HARP: Autoregressive Latent Video Prediction with High-Fidelity Image
Generator [90.74663948713615]
We train an autoregressive latent video prediction model capable of predicting high-fidelity future frames.
We produce high-resolution (256x256) videos with minimal modification to existing models.
arXiv Detail & Related papers (2022-09-15T08:41:57Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - SLAMP: Stochastic Latent Appearance and Motion Prediction [14.257878210585014]
Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components.
Most of the previous work utilizing motion is deterministic but there are methods that can model the inherent uncertainty of the future.
In this paper, we reason about appearance and motion in the videoally by predicting the future based on the motion history.
arXiv Detail & Related papers (2021-08-05T17:52:18Z) - Revisiting Hierarchical Approach for Persistent Long-Term Video
Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.
Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation.
We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z) - Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary.
We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes.
We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z) - A Spatio-temporal Transformer for 3D Human Motion Prediction [39.31212055504893]
We propose a Transformer-based architecture for the task of generative modelling of 3D human motion.
We empirically show that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-gressive models.
arXiv Detail & Related papers (2020-04-18T19:49:28Z) - Stochastic Latent Residual Video Prediction [0.0]
This paper introduces a novel temporal model whose dynamics are governed in a latent space by a residual update rule.
It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
arXiv Detail & Related papers (2020-02-21T10:44:01Z) - Deep Variational Luenberger-type Observer for Stochastic Video
Prediction [46.82873654555665]
We study the problem of video prediction by combining interpretability of state space models and representation of deep neural networks.
Our model builds upon an variational encoder which transforms the input video into a latent feature space and a Luenberger-type observer which captures the dynamic evolution of the latent features.
arXiv Detail & Related papers (2020-02-12T06:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.