FIERY: Future Instance Prediction in Bird's-Eye View from Surround
Monocular Cameras
- URL: http://arxiv.org/abs/2104.10490v1
- Date: Wed, 21 Apr 2021 12:21:40 GMT
- Title: FIERY: Future Instance Prediction in Bird's-Eye View from Surround
Monocular Cameras
- Authors: Anthony Hu, Zak Murez, Nikhil Mohan, Sof\'ia Dudas, Jeff Hawke, Vijay
Badrinarayanan, Roberto Cipolla, Alex Kendall
- Abstract summary: We present FIERY: a probabilistic future prediction model in bird's-eye view from monocular cameras.
Our approach combines the perception, sensor fusion and prediction components of a traditional autonomous driving stack.
We show that our model outperforms previous prediction baselines on the NuScenes and Lyft datasets.
- Score: 33.08698074581615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driving requires interacting with road agents and predicting their future
behaviour in order to navigate safely. We present FIERY: a probabilistic future
prediction model in bird's-eye view from monocular cameras. Our model predicts
future instance segmentation and motion of dynamic agents that can be
transformed into non-parametric future trajectories. Our approach combines the
perception, sensor fusion and prediction components of a traditional autonomous
driving stack by estimating bird's-eye-view prediction directly from surround
RGB monocular camera inputs. FIERY learns to model the inherent stochastic
nature of the future directly from camera driving data in an end-to-end manner,
without relying on HD maps, and predicts multimodal future trajectories. We
show that our model outperforms previous prediction baselines on the NuScenes
and Lyft datasets. Code is available at https://github.com/wayveai/fiery
Related papers
- Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Humanoid Locomotion as Next Token Prediction [84.21335675130021]
Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories.
We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot.
Our model can transfer to the real world even when trained on only 27 hours of walking data, and can generalize commands not seen during training like walking backward.
arXiv Detail & Related papers (2024-02-29T18:57:37Z) - BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle
Segmentation and Ego Vehicle Trajectory Prediction [4.328789276903559]
Trajectory prediction is a key task for vehicle autonomy.
There is a growing interest in learning-based trajectory prediction.
We show that there is the potential to improve the performance of perception.
arXiv Detail & Related papers (2023-12-20T15:02:37Z) - JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds [79.00975648564483]
Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios.
This dataset provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective.
The objective is to predict the future positions of agents relative to the robot using raw sensory input data.
arXiv Detail & Related papers (2023-11-05T18:59:31Z) - Comparison of Pedestrian Prediction Models from Trajectory and
Appearance Data for Autonomous Driving [13.126949982768505]
The ability to anticipate pedestrian motion changes is a critical capability for autonomous vehicles.
In urban environments, pedestrians may enter the road area and create a high risk for driving.
This work presents a comparative evaluation of trajectory-only and appearance-based methods for pedestrian prediction.
arXiv Detail & Related papers (2023-05-25T11:24:38Z) - LOPR: Latent Occupancy PRediction using Generative Models [49.15687400958916]
LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation.
We propose a framework that decouples occupancy prediction into: representation learning and prediction within the learned latent space.
arXiv Detail & Related papers (2022-10-03T22:04:00Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z) - LaPred: Lane-Aware Prediction of Multi-Modal Future Trajectories of
Dynamic Agents [10.869902339190949]
We propose a novel prediction model, referred to as the lane-aware prediction (LaPred) network.
LaPred uses the instance-level lane entities extracted from a semantic map to predict the multi-modal future trajectories.
The experiments conducted on the public nuScenes and Argoverse dataset demonstrate that the proposed LaPred method significantly outperforms the existing prediction models.
arXiv Detail & Related papers (2021-04-01T04:33:36Z) - Probabilistic Future Prediction for Video Scene Understanding [11.236856606065514]
We present a novel deep learning architecture for probabilistic future prediction from video.
We predict the future semantics, motion of complex real-world urban scenes and use this representation to control an autonomous vehicle.
arXiv Detail & Related papers (2020-03-13T17:48:21Z) - Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction [57.56466850377598]
Reasoning over visual data is a desirable capability for robotics and vision-based applications.
In this paper, we present a framework on graph to uncover relationships in different objects in the scene for reasoning about pedestrian intent.
Pedestrian intent, defined as the future action of crossing or not-crossing the street, is a very crucial piece of information for autonomous vehicles.
arXiv Detail & Related papers (2020-02-20T18:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.