HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
Prediction via Scene Encoding
- URL: http://arxiv.org/abs/2205.09753v2
- Date: Thu, 20 Jul 2023 08:41:46 GMT
- Title: HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
Prediction via Scene Encoding
- Authors: Xiaosong Jia, Penghao Wu, Li Chen, Yu Liu, Hongyang Li, Junchi Yan
- Abstract summary: We propose a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.
For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system.
Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction.
- Score: 76.9165845362574
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Encoding a driving scene into vector representations has been an essential
task for autonomous driving that can benefit downstream tasks e.g. trajectory
prediction. The driving scene often involves heterogeneous elements such as the
different types of objects (agents, lanes, traffic signs) and the semantic
relations between objects are rich and diverse. Meanwhile, there also exist
relativity across elements, which means that the spatial relation is a relative
concept and need be encoded in a ego-centric manner instead of in a global
coordinate system. Based on these observations, we propose Heterogeneous
Driving Graph Transformer (HDGT), a backbone modelling the driving scene as a
heterogeneous graph with different types of nodes and edges. For heterogeneous
graph construction, we connect different types of nodes according to diverse
semantic relations. For spatial relation encoding, the coordinates of the node
as well as its in-edges are in the local node-centric coordinate system. For
the aggregation module in the graph neural network (GNN), we adopt the
transformer structure in a hierarchical way to fit the heterogeneous nature of
inputs. Experimental results show that HDGT achieves state-of-the-art
performance for the task of trajectory prediction, on INTERACTION Prediction
Challenge and Waymo Open Motion Challenge.
Related papers
- Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Graph Transformer GANs for Graph-Constrained House Generation [223.739067413952]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The GTGAN learns effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
arXiv Detail & Related papers (2023-03-14T20:35:45Z) - SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural
Networks [12.038268908198287]
SCENE is a methodology to encode diverse traffic scenes in heterogeneous graphs.
Task-specific decoders can be applied to predict desired attributes of the scene.
arXiv Detail & Related papers (2023-01-09T17:05:28Z) - Trajectory Prediction with Graph-based Dual-scale Context Fusion [43.51107329748957]
We present a graph-based trajectory prediction network named the Dual Scale Predictor.
It encodes both the static and dynamical driving context in a hierarchical manner.
Thanks to the proposed dual-scale context fusion network, our DSP is able to generate accurate and human-like multi-modal trajectories.
arXiv Detail & Related papers (2021-11-02T13:42:16Z) - Learning Lane Graph Representations for Motion Forecasting [92.88572392790623]
We construct a lane graph from raw map data to preserve the map structure.
We exploit a fusion network consisting of four types of interactions, actor-to-lane, lane-to-lane, lane-to-actor and actor-to-actor.
Our approach significantly outperforms the state-of-the-art on the large scale Argoverse motion forecasting benchmark.
arXiv Detail & Related papers (2020-07-27T17:59:49Z) - Graph Optimal Transport for Cross-Domain Alignment [121.80313648519203]
Cross-domain alignment is fundamental to computer vision and natural language processing.
We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT)
Experiments show consistent outperformance of GOT over baselines across a wide range of tasks.
arXiv Detail & Related papers (2020-06-26T01:14:23Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z) - CoMoGCN: Coherent Motion Aware Trajectory Prediction with Graph
Representation [12.580809204729583]
We propose a novel framework, coherent motion aware graph convolutional network (CoMoGCN), for trajectory prediction in crowded scenes with group constraints.
Our method achieves state-of-the-art performance on several different trajectory prediction benchmarks, and the best average performance among all benchmarks considered.
arXiv Detail & Related papers (2020-05-02T09:10:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.