RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range
Image Representation
- URL: http://arxiv.org/abs/2009.00206v2
- Date: Tue, 23 Mar 2021 06:53:11 GMT
- Title: RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range
Image Representation
- Authors: Zhidong Liang, Ming Zhang, Zehan Zhang, Xian Zhao, Shiliang Pu
- Abstract summary: RangeRCNN is a novel and effective 3D object detection framework based on the range image representation.
In this paper, we utilize the dilated residual block (DRB) to better adapt different object scales and obtain a more flexible receptive field.
Experiments show that RangeRCNN achieves state-of-the-art performance on the KITTI dataset and the Open dataset.
- Score: 35.6155506566957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present RangeRCNN, a novel and effective 3D object detection framework
based on the range image representation. Most existing methods are voxel-based
or point-based. Though several optimizations have been introduced to ease the
sparsity issue and speed up the running time, the two representations are still
computationally inefficient. Compared to them, the range image representation
is dense and compact which can exploit powerful 2D convolution. Even so, the
range image is not preferred in 3D object detection due to scale variation and
occlusion. In this paper, we utilize the dilated residual block (DRB) to better
adapt different object scales and obtain a more flexible receptive field.
Considering scale variation and occlusion, we propose the RV-PV-BEV (range
view-point view-bird's eye view) module to transfer features from RV to BEV.
The anchor is defined in BEV which avoids scale variation and occlusion.
Neither RV nor BEV can provide enough information for height estimation;
therefore, we propose a two-stage RCNN for better 3D detection performance. The
aforementioned point view not only serves as a bridge from RV to BEV but also
provides pointwise features for RCNN. Experiments show that RangeRCNN achieves
state-of-the-art performance on the KITTI dataset and the Waymo Open dataset,
and provides more possibilities for real-time 3D object detection. We further
introduce and discuss the data augmentation strategy for the range image based
method, which will be very valuable for future research on range image.
Related papers
- What Matters in Range View 3D Object Detection [15.147558647138629]
Lidar-based perception pipelines rely on 3D object detection models to interpret complex scenes.
We achieve state-of-the-art amongst range-view 3D object detection models without using multiple techniques proposed in past range-view literature.
arXiv Detail & Related papers (2024-07-23T18:42:37Z) - WidthFormer: Toward Efficient Transformer-based BEV View Transformation [21.10523575080856]
WidthFormer is a transformer-based module to compute Bird's-Eye-View (BEV) representations from multi-view cameras for real-time autonomous-driving applications.
We first introduce a novel 3D positional encoding mechanism capable of accurately encapsulating 3D geometric information.
We then develop two modules to compensate for potential information loss due to feature compression.
arXiv Detail & Related papers (2024-01-08T11:50:23Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - RAANet: Range-Aware Attention Network for LiDAR-based 3D Object
Detection with Auxiliary Density Level Estimation [11.180128679075716]
Range-Aware Attention Network (RAANet) is developed for 3D object detection from LiDAR data for autonomous driving.
RAANet extracts more powerful BEV features and generates superior 3D object detections.
Experiments on nuScenes dataset demonstrate that our proposed approach outperforms the state-of-the-art methods for LiDAR-based 3D object detection.
arXiv Detail & Related papers (2021-11-18T04:20:13Z) - RangeDet:In Defense of Range View for LiDAR-based 3D Object Detection [48.76483606935675]
We propose an anchor-free single-stage LiDAR-based 3D object detector -- RangeDet.
Compared with the commonly used voxelized or Bird's Eye View (BEV) representations, the range view representation is more compact and without quantization error.
Our best model achieves 72.9/75.9/65.8 3D AP on vehicle/pedestrian/cyclist.
arXiv Detail & Related papers (2021-03-18T06:18:51Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.