Monocular BEV Perception of Road Scenes via Front-to-Top View Projection
- URL: http://arxiv.org/abs/2211.08144v1
- Date: Tue, 15 Nov 2022 13:52:41 GMT
- Title: Monocular BEV Perception of Road Scenes via Front-to-Top View Projection
- Authors: Wenxi Liu, Qi Li, Weixiang Yang, Jiaxin Cai, Yuanlong Yu, Yuexin Ma,
Shengfeng He, Jia Pan
- Abstract summary: We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
- Score: 57.19891435386843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: HD map reconstruction is crucial for autonomous driving. LiDAR-based methods
are limited due to expensive sensors and time-consuming computation.
Camera-based methods usually need to perform road segmentation and view
transformation separately, which often causes distortion and missing content.
To push the limits of the technology, we present a novel framework that
reconstructs a local map formed by road layout and vehicle occupancy in the
bird's-eye view given a front-view monocular image only. We propose a
front-to-top view projection (FTVP) module, which takes the constraint of cycle
consistency between views into account and makes full use of their correlation
to strengthen the view transformation and scene understanding. In addition, we
also apply multi-scale FTVP modules to propagate the rich spatial information
of low-level features to mitigate spatial deviation of the predicted object
location. Experiments on public benchmarks show that our method achieves the
state-of-the-art performance in the tasks of road layout estimation, vehicle
occupancy estimation, and multi-class semantic estimation. For multi-class
semantic estimation, in particular, our model outperforms all competitors by a
large margin. Furthermore, our model runs at 25 FPS on a single GPU, which is
efficient and applicable for real-time panorama HD map reconstruction.
Related papers
- CASPFormer: Trajectory Prediction from BEV Images with Deformable
Attention [4.9349065371630045]
We propose Context Aware Scene Prediction Transformer (CASPFormer), which can perform multi-modal motion prediction from spatialized Bird-Eye-View (BEV) images.
Our system can be integrated with any upstream perception module that is capable of generating BEV images.
We evaluate our model on the nuScenes dataset and show that it reaches state-of-the-art across multiple metrics.
arXiv Detail & Related papers (2024-09-26T12:37:22Z) - Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - Pixel to Elevation: Learning to Predict Elevation Maps at Long Range using Images for Autonomous Offroad Navigation [10.898724668444125]
We present a learning-based approach capable of predicting terrain elevation maps at long-range using only onboard egocentric images in real-time.
We experimentally validate the applicability of our proposed approach for autonomous offroad robotic navigation in complex and unstructured terrain.
arXiv Detail & Related papers (2024-01-30T22:37:24Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - NMR: Neural Manifold Representation for Autonomous Driving [2.2596039727344452]
We propose a representation for autonomous driving that learns to infer semantics and predict way-points on a manifold over a finite horizon.
We do this using an iterative attention mechanism applied on a latent high dimensional embedding of surround monocular images and partial ego-vehicle state.
We propose a sampling algorithm based on edge-adaptive coverage loss of BEV occupancy grid to generate the surface manifold.
arXiv Detail & Related papers (2022-05-11T14:58:08Z) - Real Time Monocular Vehicle Velocity Estimation using Synthetic Data [78.85123603488664]
We look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car.
We propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity.
arXiv Detail & Related papers (2021-09-16T13:10:27Z) - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View
Images [4.449481309681663]
We present the first end-to-end learning approach for directly predicting dense panoptic segmentation maps in the Bird's-Eye-View (BEV) maps.
Our architecture follows the top-down paradigm and incorporates a novel dense transformer module.
We derive a mathematical formulation for the sensitivity of the FV-BEV transformation which allows us to intelligently weight pixels in the BEV space.
arXiv Detail & Related papers (2021-08-06T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.