PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
- URL: http://arxiv.org/abs/2404.10263v1
- Date: Tue, 16 Apr 2024 03:34:35 GMT
- Title: PreGSU-A Generalized Traffic Scene Understanding Model for Autonomous Driving based on Pre-trained Graph Attention Network
- Authors: Yuning Wang, Zhiyuan Liu, Haotian Lin, Junkai Jiang, Shaobing Xu, Jianqiang Wang,
- Abstract summary: Scene understanding, defined as learning, extraction, and representation of interactions among traffic elements, is one of the critical challenges toward high-level autonomous driving (AD)
Current scene understanding methods mainly focus on one concrete single task, such as trajectory prediction and risk level evaluation.
We propose PreGSU, a generalized pre-trained scene understanding model based on graph attention network to learn the universal interaction and reasoning of traffic scenes to support various downstream tasks.
- Score: 23.38434020807342
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scene understanding, defined as learning, extraction, and representation of interactions among traffic elements, is one of the critical challenges toward high-level autonomous driving (AD). Current scene understanding methods mainly focus on one concrete single task, such as trajectory prediction and risk level evaluation. Although they perform well on specific metrics, the generalization ability is insufficient to adapt to the real traffic complexity and downstream demand diversity. In this study, we propose PreGSU, a generalized pre-trained scene understanding model based on graph attention network to learn the universal interaction and reasoning of traffic scenes to support various downstream tasks. After the feature engineering and sub-graph module, all elements are embedded as nodes to form a dynamic weighted graph. Then, four graph attention layers are applied to learn the relationships among agents and lanes. In the pre-train phase, the understanding model is trained on two self-supervised tasks: Virtual Interaction Force (VIF) modeling and Masked Road Modeling (MRM). Based on the artificial potential field theory, VIF modeling enables PreGSU to capture the agent-to-agent interactions while MRM extracts agent-to-road connections. In the fine-tuning process, the pre-trained parameters are loaded to derive detailed understanding outputs. We conduct validation experiments on two downstream tasks, i.e., trajectory prediction in urban scenario, and intention recognition in highway scenario, to verify the generalized ability and understanding ability. Results show that compared with the baselines, PreGSU achieves better accuracy on both tasks, indicating the potential to be generalized to various scenes and targets. Ablation study shows the effectiveness of pre-train task design.
Related papers
- SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs [3.733790302392792]
Tray prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene.
We present SemanticFormer, an approach for predicting multimodal trajectories by reasoning over a traffic scene graph.
arXiv Detail & Related papers (2024-04-30T09:11:04Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - SEPT: Towards Efficient Scene Representation Learning for Motion
Prediction [19.111948522155004]
This paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful models for complex traffic scenes.
experiments demonstrate that SEPT, without elaborate architectural design or feature engineering, achieves state-of-the-art performance on the Argoverse 1 and Argoverse 2 motion forecasting benchmarks.
arXiv Detail & Related papers (2023-09-26T21:56:03Z) - Unsupervised Self-Driving Attention Prediction via Uncertainty Mining
and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration.
Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z) - Towards Explainable Motion Prediction using Heterogeneous Graph
Representations [3.675875935838632]
Motion prediction systems aim to capture the future behavior of traffic scenarios enabling autonomous vehicles to perform safe and efficient planning.
GNN-based approaches have recently gained attention as they are well suited to naturally model these interactions.
In this work, we aim to improve the explainability of motion prediction systems by using different approaches.
arXiv Detail & Related papers (2022-12-07T17:43:42Z) - RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent
Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals.
The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Interaction-Based Trajectory Prediction Over a Hybrid Traffic Graph [4.574413934477815]
We propose to use a hybrid graph whose nodes represent both the traffic actors as well as the static and dynamic traffic elements present in the scene.
The different modes of temporal interaction (e.g., stopping and going) among actors and traffic elements are explicitly modeled by graph edges.
We show that our proposed model, TrafficGraphNet, achieves state-of-the-art trajectory prediction accuracy while maintaining a high level of interpretability.
arXiv Detail & Related papers (2020-09-27T18:20:03Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - Scenario-Transferable Semantic Graph Reasoning for Interaction-Aware
Probabilistic Prediction [29.623692599892365]
Accurately predicting the possible behaviors of traffic participants is an essential capability for autonomous vehicles.
We propose a novel generic representation for various driving environments by taking the advantage of semantics and domain knowledge.
arXiv Detail & Related papers (2020-04-07T00:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.