NMR: Neural Manifold Representation for Autonomous Driving
- URL: http://arxiv.org/abs/2205.05551v1
- Date: Wed, 11 May 2022 14:58:08 GMT
- Title: NMR: Neural Manifold Representation for Autonomous Driving
- Authors: Unnikrishnan R. Nair, Sarthak Sharma, Midhun S. Menon, Srikanth
Vidapanakal
- Abstract summary: We propose a representation for autonomous driving that learns to infer semantics and predict way-points on a manifold over a finite horizon.
We do this using an iterative attention mechanism applied on a latent high dimensional embedding of surround monocular images and partial ego-vehicle state.
We propose a sampling algorithm based on edge-adaptive coverage loss of BEV occupancy grid to generate the surface manifold.
- Score: 2.2596039727344452
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Autonomous driving requires efficient reasoning about the Spatio-temporal
nature of the semantics of the scene. Recent approaches have successfully
amalgamated the traditional modular architecture of an autonomous driving stack
comprising perception, prediction, and planning in an end-to-end trainable
system. Such a system calls for a shared latent space embedding with
interpretable intermediate trainable projected representation. One such
successfully deployed representation is the Bird's-Eye View(BEV) representation
of the scene in ego-frame. However, a fundamental assumption for an undistorted
BEV is the local coplanarity of the world around the ego-vehicle. This
assumption is highly restrictive, as roads, in general, do have gradients. The
resulting distortions make path planning inefficient and incorrect. To overcome
this limitation, we propose Neural Manifold Representation (NMR), a
representation for the task of autonomous driving that learns to infer
semantics and predict way-points on a manifold over a finite horizon, centered
on the ego-vehicle. We do this using an iterative attention mechanism applied
on a latent high dimensional embedding of surround monocular images and partial
ego-vehicle state. This representation helps generate motion and behavior plans
consistent with and cognizant of the surface geometry. We propose a sampling
algorithm based on edge-adaptive coverage loss of BEV occupancy grid and
associated guidance flow field to generate the surface manifold while incurring
minimal computational overhead. We aim to test the efficacy of our approach on
CARLA and SYNTHIA-SF.
Related papers
- DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving [33.609780917199394]
Self-driving vehicles must understand its environment to determine appropriate action.
Traditional systems rely on object detection to find agents in the scene.
We present a unified, interpretable, and efficient autonomy framework that moves away from cascading modules that first perceive occupancy relevant-temporal autonomy.
arXiv Detail & Related papers (2024-04-01T21:11:43Z) - Implicit Occupancy Flow Fields for Perception and Prediction in
Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants.
Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene.
This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z) - Social Occlusion Inference with Vectorized Representation for Autonomous
Driving [0.0]
This paper introduces a novel social occlusion inference approach that learns a mapping from agent trajectories and scene context to an occupancy grid map (OGM) representing the view of ego vehicle.
To verify the performance of vectorized representation, we design a baseline based on a fully transformer encoder-decoder architecture.
We evaluate our approach on an unsignalized intersection in the INTERACTION dataset, which outperforms the state-of-the-art results.
arXiv Detail & Related papers (2023-03-18T10:44:39Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - NEAT: Neural Attention Fields for End-to-End Autonomous Driving [59.60483620730437]
We present NEural ATtention fields (NEAT), a novel representation that enables efficient reasoning for imitation learning models.
NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics.
In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert.
arXiv Detail & Related papers (2021-09-09T17:55:28Z) - Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [59.60483620730437]
We propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention.
Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.
arXiv Detail & Related papers (2021-04-19T11:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.