StretchBEV: Stretching Future Instance Prediction Spatially and
Temporally
- URL: http://arxiv.org/abs/2203.13641v1
- Date: Fri, 25 Mar 2022 13:28:44 GMT
- Title: StretchBEV: Stretching Future Instance Prediction Spatially and
Temporally
- Authors: Adil Kaan Akan, Fatma G\"uney
- Abstract summary: In self-driving cars, predicting future in terms of location and motion of all the agents around the vehicle is a crucial requirement for planning.
Recently, a new joint formulation of perception and prediction has emerged by fusing rich sensory information perceived from multiple cameras into a compact bird's-eye view representation to perform prediction.
However, the quality of future predictions degrades over time while extending to longer time horizons due to multiple plausible predictions.
In this work, we address this inherent uncertainty in future predictions with a temporal model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In self-driving, predicting future in terms of location and motion of all the
agents around the vehicle is a crucial requirement for planning. Recently, a
new joint formulation of perception and prediction has emerged by fusing rich
sensory information perceived from multiple cameras into a compact bird's-eye
view representation to perform prediction. However, the quality of future
predictions degrades over time while extending to longer time horizons due to
multiple plausible predictions. In this work, we address this inherent
uncertainty in future predictions with a stochastic temporal model. Our model
learns temporal dynamics in a latent space through stochastic residual updates
at each time step. By sampling from a learned distribution at each time step,
we obtain more diverse future predictions that are also more accurate compared
to previous work, especially stretching both spatially further regions in the
scene and temporally over longer time horizons. Despite separate processing of
each time step, our model is still efficient through decoupling of the learning
of dynamics and the generation of future predictions.
Related papers
- Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention [76.37139809114274]
HPNet is a novel dynamic trajectory forecasting method.
We propose a Historical Prediction Attention module to automatically encode the dynamic relationship between successive predictions.
Our code is available at https://github.com/XiaolongTang23/HPNet.
arXiv Detail & Related papers (2024-04-09T14:42:31Z) - AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving [59.94343412438211]
We introduce the GPT style next token motion prediction into motion prediction.
Different from language data which is composed of homogeneous units -words, the elements in the driving scene could have complex spatial-temporal and semantic relations.
We propose to adopt three factorized attention modules with different neighbors for information aggregation and different position encoding styles to capture their relations.
arXiv Detail & Related papers (2024-03-20T06:22:37Z) - TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early
Intent Prediction [3.158346511479111]
We focus on pedestrians' early intention prediction in which, from a current observation of an urban scene, the model predicts the future activity of pedestrians that approach the street.
Our method is based on a multi-modal transformer that encodes past observations and produces multiple predictions at different anticipation times.
arXiv Detail & Related papers (2022-10-26T13:47:23Z) - Graph-based Spatial Transformer with Memory Replay for Multi-future
Pedestrian Trajectory Prediction [13.466380808630188]
We propose a model to forecast multiple paths based on a historical trajectory.
Our method can exploit the spatial information as well as correct the temporally inconsistent trajectories.
Our experiments show that the proposed model achieves state-of-the-art performance on multi-future prediction and competitive results for single-future prediction.
arXiv Detail & Related papers (2022-06-12T10:25:12Z) - Predicting Future Occupancy Grids in Dynamic Environment with
Spatio-Temporal Learning [63.25627328308978]
We propose a-temporal prediction network pipeline to generate future occupancy predictions.
Compared to current SOTA, our approach predicts occupancy for a longer horizon of 3 seconds.
We publicly release our grid occupancy dataset based on nulis to support further research.
arXiv Detail & Related papers (2022-05-06T13:45:32Z) - Learning Future Object Prediction with a Spatiotemporal Detection
Transformer [1.1543275835002982]
We train a detection transformer to directly output future objects.
We extend existing transformers in two ways to capture scene dynamics.
Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons.
arXiv Detail & Related papers (2022-04-21T17:58:36Z) - Video Prediction at Multiple Scales with Hierarchical Recurrent Networks [24.536256844130996]
We propose a novel video prediction model able to forecast future possible outcomes of different levels of granularity simultaneously.
By combining spatial and temporal downsampling, MSPred is able to efficiently predict abstract representations over long time horizons.
In our experiments, we demonstrate that our proposed model accurately predicts future video frames as well as other representations on various scenarios.
arXiv Detail & Related papers (2022-03-17T13:08:28Z) - Revisiting Hierarchical Approach for Persistent Long-Term Video
Prediction [55.4498466252522]
We set a new standard of video prediction with orders of magnitude longer prediction time than existing approaches.
Our method predicts future frames by first estimating a sequence of semantic structures and subsequently translating the structures to pixels by video-to-video translation.
We evaluate our method on three challenging datasets involving car driving and human dancing, and demonstrate that it can generate complicated scene structures and motions over a very long time horizon.
arXiv Detail & Related papers (2021-04-14T08:39:38Z) - LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving [139.33800431159446]
LookOut is an approach to jointly perceive the environment and predict a diverse set of futures from sensor data.
We show that our model demonstrates significantly more diverse and sample-efficient motion forecasting in a large-scale self-driving dataset.
arXiv Detail & Related papers (2021-01-16T23:19:22Z) - Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary.
We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes.
We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.