Pillar R-CNN for Point Cloud 3D Object Detection
- URL: http://arxiv.org/abs/2302.13301v1
- Date: Sun, 26 Feb 2023 12:07:25 GMT
- Title: Pillar R-CNN for Point Cloud 3D Object Detection
- Authors: Guangsheng Shi, Ruifeng Li and Chao Ma
- Abstract summary: We devise a conceptually simple yet effective two-stage 3D detection architecture, named Pillar R-CNN.
Our Pillar R-CNN performs favorably against state-of-the-art 3D detectors on the large-scale Open dataset.
It should be highlighted that further exploration into BEV perception for applications involving autonomous driving is now possible thanks to the effective and elegant Pillar R-CNN architecture.
- Score: 4.169126928311421
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of point cloud 3D object detection hinges on effectively
representing raw points, grid-based voxels or pillars. Recent two-stage 3D
detectors typically take the point-voxel-based R-CNN paradigm, i.e., the first
stage resorts to the 3D voxel-based backbone for 3D proposal generation on
bird-eye-view (BEV) representation and the second stage refines them via the
intermediate point representation. Their primary mechanisms involve the
utilization of intermediary keypoints to restore the substantial 3D structure
context from the converted BEV representation. The skilled point-voxel feature
interaction, however, makes the entire detection pipeline more complex and
compute-intensive. In this paper, we take a different viewpoint -- the
pillar-based BEV representation owns sufficient capacity to preserve the 3D
structure. In light of the latest advances in BEV-based perception, we devise a
conceptually simple yet effective two-stage 3D detection architecture, named
Pillar R-CNN. On top of densified BEV feature maps, Pillar R-CNN can easily
introduce the feature pyramid architecture to generate 3D proposals at various
scales and take the simple 2D R-CNN style detect head for box refinement. Our
Pillar R-CNN performs favorably against state-of-the-art 3D detectors on the
large-scale Waymo Open Dataset but at a small extra cost. It should be
highlighted that further exploration into BEV perception for applications
involving autonomous driving is now possible thanks to the effective and
elegant Pillar R-CNN architecture.
Related papers
- LiDAR-Based 3D Object Detection via Hybrid 2D Semantic Scene Generation [38.38852904444365]
This paper proposes a novel scene representation that encodes both the semantics and geometry of the 3D environment in 2D.
Our simple yet effective design can be easily integrated into most state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2023-04-04T04:05:56Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - SRCN3D: Sparse R-CNN 3D for Compact Convolutional Multi-View 3D Object
Detection and Tracking [12.285423418301683]
This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-sparse detector that incorporates sparse queries, sparse attention with box-wise sampling, and sparse prediction.
Experiments on nuScenes dataset demonstrate that SRCN3D achieves competitive performance in both 3D object detection and multi-object tracking tasks.
arXiv Detail & Related papers (2022-06-29T07:58:39Z) - From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object
Detection [101.20784125067559]
We propose a new architecture, namely Hallucinated Hollow-3D R-CNN, to address the problem of 3D object detection.
In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view.
The 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation.
arXiv Detail & Related papers (2021-07-30T02:00:06Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN.
By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models.
Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN)
Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction.
It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.