SLAMP: Stochastic Latent Appearance and Motion Prediction
- URL: http://arxiv.org/abs/2108.02760v1
- Date: Thu, 5 Aug 2021 17:52:18 GMT
- Title: SLAMP: Stochastic Latent Appearance and Motion Prediction
- Authors: Adil Kaan Akan, Erkut Erdem, Aykut Erdem, Fatma G\"uney
- Abstract summary: Motion is an important cue for video prediction and often utilized by separating video content into static and dynamic components.
Most of the previous work utilizing motion is deterministic but there are methods that can model the inherent uncertainty of the future.
In this paper, we reason about appearance and motion in the videoally by predicting the future based on the motion history.
- Score: 14.257878210585014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motion is an important cue for video prediction and often utilized by
separating video content into static and dynamic components. Most of the
previous work utilizing motion is deterministic but there are stochastic
methods that can model the inherent uncertainty of the future. Existing
stochastic models either do not reason about motion explicitly or make limiting
assumptions about the static part. In this paper, we reason about appearance
and motion in the video stochastically by predicting the future based on the
motion history. Explicit reasoning about motion without history already reaches
the performance of current stochastic models. The motion history further
improves the results by allowing to predict consistent dynamics several frames
into the future. Our model performs comparably to the state-of-the-art models
on the generic video prediction datasets, however, significantly outperforms
them on two challenging real-world autonomous driving datasets with complex
motion and dynamic background.
Related papers
- GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - State-space Decomposition Model for Video Prediction Considering Long-term Motion Trend [3.910356300831074]
We propose a state-space decomposition video prediction model that decomposes the overall video frame generation into deterministic appearance prediction and motion prediction.
We infer the long-term motion trend from conditional frames to guide the generation of future frames that exhibit high consistency with the conditional frames.
arXiv Detail & Related papers (2024-04-17T17:19:48Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction [20.701792842768747]
We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain.
Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
arXiv Detail & Related papers (2023-12-11T16:12:43Z) - PREF: Predictability Regularized Neural Motion Fields [68.60019434498703]
Knowing 3D motions in a dynamic scene is essential to many vision applications.
We leverage a neural motion field for estimating the motion of all points in a multiview setting.
We propose to regularize the estimated motion to be predictable.
arXiv Detail & Related papers (2022-09-21T22:32:37Z) - Stochastic Video Prediction with Structure and Motion [14.424465835834042]
We propose to factorize video observations into static and dynamic components.
By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts.
Our experiments demonstrate that disentangling structure and motion helps video prediction, leading to better future predictions in complex driving scenarios.
arXiv Detail & Related papers (2022-03-20T11:29:46Z) - Generating Smooth Pose Sequences for Diverse Human Motion Prediction [90.45823619796674]
We introduce a unified deep generative network for both diverse and controllable motion prediction.
Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy.
arXiv Detail & Related papers (2021-08-19T00:58:00Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary.
We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes.
We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z) - Dynamic Future Net: Diversified Human Motion Generation [31.987602940970888]
Human motion modelling is crucial in many areas such as computer graphics, vision and virtual reality.
We present Dynamic Future Net, a new deep learning model where we explicitly focuses on the intrinsic motionity of human motion dynamics.
Our model can generate a large number of high-quality motions with arbitrary duration, and visuallyincing variations in both space and time.
arXiv Detail & Related papers (2020-08-25T02:31:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.