Social Occlusion Inference with Vectorized Representation for Autonomous
Driving
- URL: http://arxiv.org/abs/2303.10385v2
- Date: Wed, 16 Aug 2023 06:50:24 GMT
- Title: Social Occlusion Inference with Vectorized Representation for Autonomous
Driving
- Authors: Bochao Huang and Pin
- Abstract summary: This paper introduces a novel social occlusion inference approach that learns a mapping from agent trajectories and scene context to an occupancy grid map (OGM) representing the view of ego vehicle.
To verify the performance of vectorized representation, we design a baseline based on a fully transformer encoder-decoder architecture.
We evaluate our approach on an unsignalized intersection in the INTERACTION dataset, which outperforms the state-of-the-art results.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous vehicles must be capable of handling the occlusion of the
environment to ensure safe and efficient driving. In urban environment,
occlusion often arises due to other vehicles obscuring the perception of the
ego vehicle. Since the occlusion condition can impact the trajectories of
vehicles, the behavior of other vehicles is helpful in making inferences about
the occlusion as a remedy for perceptual deficiencies. This paper introduces a
novel social occlusion inference approach that learns a mapping from agent
trajectories and scene context to an occupancy grid map (OGM) representing the
view of ego vehicle. Specially, vectorized features are encoded through the
polyline encoder to aggregate features of vectors into features of polylines. A
transformer module is then utilized to model the high-order interactions of
polylines. Importantly, occlusion queries are proposed to fuse polyline
features and generate the OGM without the input of visual modality. To verify
the performance of vectorized representation, we design a baseline based on a
fully transformer encoder-decoder architecture mapping the OGM with occlusion
and historical trajectories information to the ground truth OGM. We evaluate
our approach on an unsignalized intersection in the INTERACTION dataset, which
outperforms the state-of-the-art results.
Related papers
- GITSR: Graph Interaction Transformer-based Scene Representation for Multi Vehicle Collaborative Decision-making [9.910230703889956]
This study focuses on efficient scene representation and the modeling of spatial interaction behaviors of traffic states.
In this study, we propose GITSR, an effective framework for Graph Interaction Transformer-based Scene Representation.
arXiv Detail & Related papers (2024-11-03T15:27:26Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction [3.733790302392792]
SocialFormer is an agent interaction-aware trajectory prediction method.
We present a temporal encoder based on gated recurrent units (GRU) to model the temporal social behavior of agent movements.
We evaluate SocialFormer for the trajectory prediction task on the popular nuScenes benchmark and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-05-06T19:47:23Z) - GraphAD: Interaction Scene Graph for End-to-end Autonomous Driving [16.245949174447574]
We propose the Interaction Scene Graph (ISG) as a unified method to model the interactions among the ego-vehicle, road agents, and map elements.
We evaluate the proposed method for end-to-end autonomous driving on the nuScenes dataset.
arXiv Detail & Related papers (2024-03-28T02:22:28Z) - Real-Time Motion Prediction via Heterogeneous Polyline Transformer with
Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks.
We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers.
By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z) - NMR: Neural Manifold Representation for Autonomous Driving [2.2596039727344452]
We propose a representation for autonomous driving that learns to infer semantics and predict way-points on a manifold over a finite horizon.
We do this using an iterative attention mechanism applied on a latent high dimensional embedding of surround monocular images and partial ego-vehicle state.
We propose a sampling algorithm based on edge-adaptive coverage loss of BEV occupancy grid to generate the surface manifold.
arXiv Detail & Related papers (2022-05-11T14:58:08Z) - HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
Prediction via Scene Encoding [76.9165845362574]
We propose a backbone modelling the driving scene as a heterogeneous graph with different types of nodes and edges.
For spatial relation encoding, the coordinates of the node as well as its in-edges are in the local node-centric coordinate system.
Experimental results show that HDGT achieves state-of-the-art performance for the task of trajectory prediction.
arXiv Detail & Related papers (2022-04-30T07:08:30Z) - Decoder Fusion RNN: Context and Interaction Aware Decoders for
Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting.
Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder.
We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z) - VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized
Representation [74.56282712099274]
This paper introduces VectorNet, a hierarchical graph neural network that exploits the spatial locality of individual road components represented by vectors.
By operating on the vectorized high definition (HD) maps and agent trajectories, we avoid lossy rendering and computationally intensive ConvNet encoding steps.
We evaluate VectorNet on our in-house behavior prediction benchmark and the recently released Argoverse forecasting dataset.
arXiv Detail & Related papers (2020-05-08T19:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.