VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation
- URL: http://arxiv.org/abs/2005.04259v1
- Date: Fri, 8 May 2020 19:07:03 GMT
- Title: VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation
- Authors: Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong
Li, Cordelia Schmid
- Abstract summary: This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
- Score: 74.56282712099274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Behavior prediction in dynamic, multi-agent systems is an important problem
in the context of self-driving cars, due to the complex representations and
interactions of road components, including moving agents (e.g. pedestrians and
vehicles) and road context information (e.g. lanes, traffic lights). This paper
introduces VectorNet, a hierarchical graph neural network that first exploits
the spatial locality of individual road components represented by vectors and
then models the high-order interactions among all components. In contrast to
most recent approaches, which render trajectories of moving agents and road
context information as bird-eye images and encode them with convolutional
neural networks (ConvNets), our approach operates on a vector representation.
By operating on the vectorized high definition (HD) maps and agent
trajectories, we avoid lossy rendering and computationally intensive ConvNet
encoding steps. To further boost VectorNet's capability in learning context
features, we propose a novel auxiliary task to recover the randomly masked out
map entities and agent trajectories based on their context. We evaluate
VectorNet on our in-house behavior prediction benchmark and the recently
released Argoverse forecasting dataset. Our method achieves on par or better
performance than the competitive rendering approach on both benchmarks while
saving over 70% of the model parameters with an order of magnitude reduction in
FLOPs. It also outperforms the state of the art on the Argoverse dataset.
Related papers
- TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From
Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles.
Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z) - TSGN: Temporal Scene Graph Neural Networks with Projected Vectorized
Representation for Multi-Agent Motion Prediction [2.5780349894383807]
TSGN can predict multimodal future trajectories for all agents simultaneously, plausibly, and accurately.
We propose a Hierarchical Lane Transformer for capturing interactions between agents and road network.
Experiments show TSGN achieves state-of-the-art performance on the Argoverse motion forecasting benchmar.
arXiv Detail & Related papers (2023-05-14T15:58:55Z) - GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization.
We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph.
Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z) - RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent
Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals.
The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z) - HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
Prediction via Scene Encoding [76.9165845362574]
We propose a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.
For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system.
Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction.
arXiv Detail & Related papers (2022-04-30T07:08:30Z) - Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor.
It encodes both the static and dynamical driving context in a hierarchical manner.
Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z) - Decoder Fusion RNN: Context and Interaction Aware Decoders for
Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting.
Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder.
We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z) - Exploiting latent representation of sparse semantic layers for improved
short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map.
By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation.
We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.