Related papers: VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

URL: http://arxiv.org/abs/2005.04259v1
Date: Fri, 8 May 2020 19:07:03 GMT
Title: VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Authors: Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, Cordelia Schmid
Abstract summary: This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
Score: 74.56282712099274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e.g. pedestrians and vehicles) and road context information (e.g. lanes, traffic lights). This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components. In contrast to most recent approaches, which render trajectories of moving agents and road context information as bird-eye images and encode them with convolutional neural networks (ConvNets), our approach operates on a vector representation. By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet's capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context. We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset. Our method achieves on par or better performance than the competitive rendering approach on both benchmarks while saving over 70% of the model parameters with an order of magnitude reduction in FLOPs. It also outperforms the state of the art on the Argoverse dataset.

Related papers

GC-GAT: Multimodal Vehicular Trajectory Prediction using Graph Goal Conditioning and Cross-context Attention [0.0]
We present a lane graph-based motion prediction model that first predicts graph-based goal proposals and later fuses them with cross attention over multiple contextual elements. We evaluate our work on nuScenes motion prediction dataset, achieving state-of-the-art results.
arXiv Detail & Related papers (2025-04-15T12:53:07Z)
TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception. Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions. We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z)
Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z)
TSGN: Temporal Scene Graph Neural Networks with Projected Vectorized Representation for Multi-Agent Motion Prediction [2.5780349894383807]
TSGN can predict multimodal future trajectories for all agents simultaneously, plausibly, and accurately. We propose a Hierarchical Lane Transformer for capturing interactions between agents and road network. Experiments show TSGN achieves state-of-the-art performance on the Argoverse motion forecasting benchmar.
arXiv Detail & Related papers (2023-05-14T15:58:55Z)
GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting [121.42898228997538]
We propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. We leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction.
arXiv Detail & Related papers (2022-11-04T16:10:50Z)
RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals. The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z)
HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding [76.9165845362574]
We propose a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges. For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system. Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction.
arXiv Detail & Related papers (2022-04-30T07:08:30Z)
Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor. It encodes both the static and dynamical driving context in a hierarchical manner. Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z)
Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting. Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder. We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z)
Exploiting latent representation of sparse semantic layers for improved short-term motion prediction with Capsule Networks [0.12183405753834559]
This paper explores use of Capsule Networks (CapsNets) in the context of learning a hierarchical representation of sparse semantic layers corresponding to small regions of the High-Definition (HD) map. By using an architecture based on CapsNets the model is able to retain hierarchical relationships between detected features within images whilst also preventing loss of spatial data often caused by the pooling operation. We show that our model achieves significant improvement over recently published works on prediction, whilst drastically reducing the overall size of the network.
arXiv Detail & Related papers (2021-03-02T11:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.