GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making
- URL: http://arxiv.org/abs/2411.01608v1
- Date: Sun, 03 Nov 2024 15:27:26 GMT
- Title: GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making
- Authors: Xingyu Hu, Lijun Zhang, Dejian Meng, Ye Han, Lisha Yuan,
- Abstract summary: This study focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states.
In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation.
- Score: 9.910230703889956
- License:
- Abstract: In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation for multi-vehicle collaborative decision-making in intelligent transportation system. In the context of mixed traffic where Connected Automated Vehicles (CAVs) and Human Driving Vehicles (HDVs) coexist, in order to enhance the understanding of the environment by CAVs to improve decision-making capabilities, this framework focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states. We first extract features of the driving environment based on the background of intelligent networking. Subsequently, the local scene representation, which is based on the agent-centric and dynamic occupation grid, is calculated by the Transformer module. Besides, feasible region of the map is captured through the multi-head attention mechanism to reduce the collision of vehicles. Notably, spatial interaction behaviors, based on motion information, are modeled as graph structures and extracted via Graph Neural Network (GNN). Ultimately, the collaborative decision-making among multiple vehicles is formulated as a Markov Decision Process (MDP), with driving actions output by Reinforcement Learning (RL) algorithms. Our algorithmic validation is executed within the extremely challenging scenario of highway off-ramp task, thereby substantiating the superiority of agent-centric approach to scene representation. Simulation results demonstrate that the GITSR method can not only effectively capture scene representation but also extract spatial interaction data, outperforming the baseline method across various comparative metrics.
Related papers
- SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction [3.733790302392792]
SocialFormer is an agent interaction-aware trajectory prediction method.
We present a temporal encoder based on gated recurrent units (GRU) to model the temporal social behavior of agent movements.
We evaluate SocialFormer for the trajectory prediction task on the popular nuScenes benchmark and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-05-06T19:47:23Z) - GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving [16.245949174447574]
We propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements.
We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset.
arXiv Detail & Related papers (2024-03-28T02:22:28Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - A Deeply Supervised Semantic Segmentation Method Based on GAN [9.441379867578332]
The proposed model integrates a generative adversarial network (GAN) framework into the traditional semantic segmentation model.
The effectiveness of our approach is demonstrated by a significant boost in performance on the road crack dataset.
arXiv Detail & Related papers (2023-10-06T08:22:24Z) - Graph-Based Interaction-Aware Multimodal 2D Vehicle Trajectory
Prediction using Diffusion Graph Convolutional Networks [17.989423104706397]
This study presents the Graph-based Interaction-aware Multi-modal Trajectory Prediction framework.
Within this framework, vehicles' motions are conceptualized as nodes in a time-varying graph, and the traffic interactions are represented by a dynamic adjacency matrix.
We employ a driving intention-specific feature fusion, enabling the adaptive integration of historical and future embeddings.
arXiv Detail & Related papers (2023-09-05T06:28:13Z) - Social Occlusion Inference with Vectorized Representation for Autonomous
Driving [0.0]
This paper introduces a novel social occlusion inference approach that learns a mapping from agent trajectories and scene context to an occupancy grid map (OGM) representing the view of ego vehicle.
To verify the performance of vectorized representation, we design a baseline based on a fully transformer encoder-decoder architecture.
We evaluate our approach on an unsignalized intersection in the INTERACTION dataset, which outperforms the state-of-the-art results.
arXiv Detail & Related papers (2023-03-18T10:44:39Z) - RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent
Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals.
The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z) - Implicit Latent Variable Model for Scene-Consistent Motion Forecasting [78.74510891099395]
In this paper, we aim to learn scene-consistent motion forecasts of complex urban traffic directly from sensor data.
We model the scene as an interaction graph and employ powerful graph neural networks to learn a distributed latent representation of the scene.
arXiv Detail & Related papers (2020-07-23T14:31:25Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.