Related papers: A Simple Baseline for BEV Perception Without LiDAR

A Simple Baseline for BEV Perception Without LiDAR

URL: http://arxiv.org/abs/2206.07959v1
Date: Thu, 16 Jun 2022 06:57:32 GMT
Title: A Simple Baseline for BEV Perception Without LiDAR
Authors: Adam W. Harley and Zhaoyuan Fang and Jie Li and Rares Ambrus and Katerina Fragkiadaki
Abstract summary: Building 3D perception systems for autonomous vehicles that do not rely on LiDAR is a critical research problem. Current methods use multi-view RGB data collected from cameras around the vehicle. We propose a simple baseline model, where the "lifting" step simply averages features from all projected image locations.
Score: 37.00868568802673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building 3D perception systems for autonomous vehicles that do not rely on LiDAR is a critical research problem because of the high expense of LiDAR systems compared to cameras and other sensors. Current methods use multi-view RGB data collected from cameras around the vehicle and neurally "lift" features from the perspective images to the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. Recent research focuses on the way the features are lifted from images to the BEV plane. We instead propose a simple baseline model, where the "lifting" step simply averages features from all projected image locations, and find that it outperforms the current state-of-the-art in BEV vehicle segmentation. Our ablations show that batch size, data augmentation, and input resolution play a large part in performance. Additionally, we reconsider the utility of radar input, which has previously been either ignored or found non-helpful by recent works. With a simple RGB-radar fusion module, we obtain a sizable boost in performance, approaching the accuracy of a LiDAR-enabled system.

Related papers

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion [58.77329237533034]
We propose a Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection. RaCFormer achieves superior results of 64.9% mAP and 70.2% on nuScenes datasets.
arXiv Detail & Related papers (2024-12-17T09:47:48Z)
SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detection [15.551625571158056]
We propose a LiDAR-camera fusion framework, named SimpleBEV, for accurate 3D object detection. Our method achieves 77.6% NDS accuracy on the nuScenes dataset, showcasing superior performance in the 3D object detection track.
arXiv Detail & Related papers (2024-11-08T02:51:39Z)
RCBEVDet++: Toward High-accuracy Radar-Camera Fusion 3D Perception Network [34.45694077040797]
We present a radar-camera fusion 3D object detection framework called BEEVDet. RadarBEVNet encodes sparse radar points into a dense bird's-eye-view feature. Our method achieves state-of-the-art radar-camera fusion results in 3D object detection, BEV semantic segmentation, and 3D multi-object tracking tasks.
arXiv Detail & Related papers (2024-09-08T05:14:27Z)
Better Monocular 3D Detectors with LiDAR from the Past [64.6759926054061]
Camera-based 3D detectors often suffer inferior performance compared to LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. We show consistent and significant performance gain across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
arXiv Detail & Related papers (2024-04-08T01:38:43Z)
BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation [22.870994478494566]
We introduce BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data. We show that incorporating radar information significantly enhances robustness in challenging environmental conditions.
arXiv Detail & Related papers (2024-03-18T13:14:46Z)
RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion [11.646949644683755]
We present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We show significant performance gains of up to 28% increase in the nuScenes detection score.
arXiv Detail & Related papers (2023-05-25T09:26:04Z)
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios [51.285561119993105]
We present BEV-MAE, an efficient masked autoencoder pre-training framework for LiDAR-based 3D object detection in autonomous driving. Specifically, we propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation. We introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder.
arXiv Detail & Related papers (2022-12-12T08:15:03Z)
Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes. We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z)
Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving [74.74519047735916]
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors. Data collected for other use cases (such as virtual reality, gaming, and animation) may not be usable for AV applications. We propose one of the first approaches to alleviate this problem in the AV setting.
arXiv Detail & Related papers (2021-12-22T18:57:16Z)
Recovering and Simulating Pedestrians in the Wild [81.38135735146015]
We propose to recover the shape and motion of pedestrians from sensor readings captured in the wild by a self-driving car driving around. We incorporate the reconstructed pedestrian assets bank in a realistic 3D simulation system. We show that the simulated LiDAR data can be used to significantly reduce the amount of real-world data required for visual perception tasks.
arXiv Detail & Related papers (2020-11-16T17:16:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.