NEAT: Neural Attention Fields for End-to-End Autonomous Driving
- URL: http://arxiv.org/abs/2109.04456v1
- Date: Thu, 9 Sep 2021 17:55:28 GMT
- Title: NEAT: Neural Attention Fields for End-to-End Autonomous Driving
- Authors: Kashyap Chitta, Aditya Prakash, Andreas Geiger
- Abstract summary: We present NEural ATtention fields (NEAT), a novel representation that enables efficient reasoning for imitation learning models.
NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics.
In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert.
- Score: 59.60483620730437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient reasoning about the semantic, spatial, and temporal structure of a
scene is a crucial prerequisite for autonomous driving. We present NEural
ATtention fields (NEAT), a novel representation that enables such reasoning for
end-to-end imitation learning models. NEAT is a continuous function which maps
locations in Bird's Eye View (BEV) scene coordinates to waypoints and
semantics, using intermediate attention maps to iteratively compress
high-dimensional 2D image features into a compact representation. This allows
our model to selectively attend to relevant regions in the input while ignoring
information irrelevant to the driving task, effectively associating the images
with the BEV representation. In a new evaluation setting involving adverse
environmental conditions and challenging scenarios, NEAT outperforms several
strong baselines and achieves driving scores on par with the privileged CARLA
expert used to generate its training data. Furthermore, visualizing the
attention maps for models with NEAT intermediate representations provides
improved interpretability.
Related papers
- VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization [108.68014173017583]
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car.
We propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space.
Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens
arXiv Detail & Related papers (2024-11-03T16:09:47Z) - Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment [2.3575550107698016]
We introduce an AV centrictemporal attention encoding (STAE) mechanism for learning dynamic interactions with different surrounding vehicles.
To understand map and route context, we employ a context encoder to extract context maps.
The resulting model is trained using the Soft Actor Critic (SAC) algorithm.
arXiv Detail & Related papers (2024-07-12T02:34:44Z) - TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement
Approach [1.3931837019950217]
We advocate for the use of Bird's Eye View perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity.
In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional methods.
arXiv Detail & Related papers (2023-12-20T15:22:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Ground then Navigate: Language-guided Navigation in Dynamic Scenes [13.870303451896248]
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings.
We solve the problem by explicitly grounding the navigable regions corresponding to the textual command.
We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.
arXiv Detail & Related papers (2022-09-24T09:51:09Z) - NMR: Neural Manifold Representation for Autonomous Driving [2.2596039727344452]
We propose a representation for autonomous driving that learns to infer semantics and predict way-points on a manifold over a finite horizon.
We do this using an iterative attention mechanism applied on a latent high dimensional embedding of surround monocular images and partial ego-vehicle state.
We propose a sampling algorithm based on edge-adaptive coverage loss of BEV occupancy grid to generate the surface manifold.
arXiv Detail & Related papers (2022-05-11T14:58:08Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.