Probabilistic Future Prediction for Video Scene Understanding
- URL: http://arxiv.org/abs/2003.06409v2
- Date: Fri, 17 Jul 2020 10:07:40 GMT
- Title: Probabilistic Future Prediction for Video Scene Understanding
- Authors: Anthony Hu, Fergal Cotter, Nikhil Mohan, Corina Gurau, Alex Kendall
- Abstract summary: We present a novel deep learning architecture for probabilistic future prediction from video.
We predict the future semantics, motion of complex real-world urban scenes and use this representation to control an autonomous vehicle.
- Score: 11.236856606065514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel deep learning architecture for probabilistic future
prediction from video. We predict the future semantics, geometry and motion of
complex real-world urban scenes and use this representation to control an
autonomous vehicle. This work is the first to jointly predict ego-motion,
static scene, and the motion of dynamic agents in a probabilistic manner, which
allows sampling consistent, highly probable futures from a compact latent
space. Our model learns a representation from RGB video with a spatio-temporal
convolutional module. The learned representation can be explicitly decoded to
future semantic segmentation, depth, and optical flow, in addition to being an
input to a learnt driving policy. To model the stochasticity of the future, we
introduce a conditional variational approach which minimises the divergence
between the present distribution (what could happen given what we have seen)
and the future distribution (what we observe actually happens). During
inference, diverse futures are generated by sampling from the present
distribution.
Related papers
- GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video
Prediction [20.701792842768747]
We propose a novel video prediction model, which has infinite-dimensional latent variables over the temporal domain.
Our model is able to achieve temporal continuous prediction, i.e., predicting in an unsupervised way, with an arbitrarily high frame rate.
arXiv Detail & Related papers (2023-12-11T16:12:43Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [88.45326906116165]
We present a new framework to formulate the trajectory prediction task as a reverse process of motion indeterminacy diffusion (MID)
We encode the history behavior information and the social interactions as a state embedding and devise a Transformer-based diffusion model to capture the temporal dependencies of trajectories.
Experiments on the human trajectory prediction benchmarks including the Stanford Drone and ETH/UCY datasets demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-25T16:59:08Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - FIERY: Future Instance Prediction in Bird's-Eye View from Surround
Monocular Cameras [33.08698074581615]
We present FIERY: a probabilistic future prediction model in bird's-eye view from monocular cameras.
Our approach combines the perception, sensor fusion and prediction components of a traditional autonomous driving stack.
We show that our model outperforms previous prediction baselines on the NuScenes and Lyft datasets.
arXiv Detail & Related papers (2021-04-21T12:21:40Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [139.33800431159446]
LookOut is an approach to jointly perceive the environment and predict a diverse set of futures from sensor data.
We show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset.
arXiv Detail & Related papers (2021-01-16T23:19:22Z) - Future Frame Prediction of a Video Sequence [5.660207256468971]
The ability to predict, anticipate and reason about future events is the essence of intelligence.
The ability to predict, anticipate and reason about future events is the essence of intelligence.
arXiv Detail & Related papers (2020-08-31T15:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.