Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From
Aerial Images
- URL: http://arxiv.org/abs/2305.11856v2
- Date: Wed, 20 Sep 2023 00:09:13 GMT
- Title: Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From
Aerial Images
- Authors: Yunpeng Liu, Vasileios Lioutas, Jonathan Wilder Lavington, Matthew
Niedoba, Justice Sefas, Setareh Dabiri, Dylan Green, Xiaoxuan Liang, Berend
Zwartsenberg, Adam \'Scibior, Frank Wood
- Abstract summary: We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles.
Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
- Score: 14.689298253430568
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The development of algorithms that learn multi-agent behavioral models using
human demonstrations has led to increasingly realistic simulations in the field
of autonomous driving. In general, such models learn to jointly predict
trajectories for all controlled agents by exploiting road context information
such as drivable lanes obtained from manually annotated high-definition (HD)
maps. Recent studies show that these models can greatly benefit from increasing
the amount of human data available for training. However, the manual annotation
of HD maps which is necessary for every new location puts a bottleneck on
efficiently scaling up human traffic datasets. We propose an aerial image-based
map (AIM) representation that requires minimal annotation and provides rich
road context information for traffic agents like pedestrians and vehicles. We
evaluate multi-agent trajectory prediction using the AIM by incorporating it
into a differentiable driving simulator as an image-texture-based
differentiable rendering module. Our results demonstrate competitive
multi-agent trajectory prediction performance especially for pedestrians in the
scene when using our AIM representation as compared to models trained with
rasterized HD maps.
Related papers
- MapsTP: HD Map Images Based Multimodal Trajectory Prediction for Automated Vehicles [8.229161517598373]
We leverage ResNet-50 to extract image features from high-definition map data and use IMU sensor data to calculate speed, acceleration, and yaw rate.
A temporal probabilistic network is employed to compute potential trajectories, selecting the most accurate and highly probable trajectory paths.
arXiv Detail & Related papers (2024-07-08T10:45:30Z) - MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving [15.965681867350215]
This paper introduces a trajectory prediction model tailored for autonomous driving.
It harnesses historical trajectory data combined with a novel geometric dynamic graph-based behavior-aware module.
It achieves computational efficiency and reduced parameter overhead.
arXiv Detail & Related papers (2024-05-02T13:13:52Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs [3.733790302392792]
Tray prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene.
We present SemanticFormer, an approach for predicting multimodal trajectories by reasoning over a traffic scene graph.
arXiv Detail & Related papers (2024-04-30T09:11:04Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv Detail & Related papers (2023-12-07T18:53:27Z) - TrafficBots: Towards World Models for Autonomous Driving Simulation and
Motion Prediction [149.5716746789134]
We show data-driven traffic simulation can be formulated as a world model.
We present TrafficBots, a multi-agent policy built upon motion prediction and end-to-end driving.
Experiments on the open motion dataset show TrafficBots can simulate realistic multi-agent behaviors.
arXiv Detail & Related papers (2023-03-07T18:28:41Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping
and Multi-Agent [2.512827436728378]
We propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously.
The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions.
arXiv Detail & Related papers (2022-04-12T03:57:01Z) - Geo-Context Aware Study of Vision-Based Autonomous Driving Models and
Spatial Video Data [9.883009014227815]
Vision-based deep learning (DL) methods have made great progress in learning autonomous driving models from large-scale crowd-sourced video datasets.
We develop a geo-context aware visualization system for the study of Autonomous Driving Model (ADM) predictions together with large-scale ADM video data.
arXiv Detail & Related papers (2021-08-20T17:33:54Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.