Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection
- URL: http://arxiv.org/abs/2109.02499v1
- Date: Mon, 6 Sep 2021 14:17:51 GMT
- Title: Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection
- Authors: Jiageng Mao and Minzhe Niu and Haoyue Bai and Xiaodan Liang and Hang
Xu and Chunjing Xu
- Abstract summary: We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
We propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance.
- Score: 89.66162518035144
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present a flexible and high-performance framework, named Pyramid R-CNN,
for two-stage 3D object detection from point clouds. Current approaches
generally rely on the points or voxels of interest for RoI feature extraction
on the second stage, but cannot effectively handle the sparsity and non-uniform
distribution of those points, and this may result in failures in detecting
objects that are far away. To resolve the problems, we propose a novel
second-stage module, named pyramid RoI head, to adaptively learn the features
from the sparse points of interest. The pyramid RoI head consists of three key
components. Firstly, we propose the RoI-grid Pyramid, which mitigates the
sparsity problem by extensively collecting points of interest for each RoI in a
pyramid manner. Secondly, we propose RoI-grid Attention, a new operation that
can encode richer information from sparse points by incorporating conventional
attention-based and graph-based point operators into a unified formulation.
Thirdly, we propose the Density-Aware Radius Prediction (DARP) module, which
can adapt to different point density levels by dynamically adjusting the
focusing range of RoIs. Combining the three components, our pyramid RoI head is
robust to the sparse and imbalanced circumstances, and can be applied upon
various 3D backbones to consistently boost the detection performance. Extensive
experiments show that Pyramid R-CNN outperforms the state-of-the-art 3D
detection models by a large margin on both the KITTI dataset and the Waymo Open
dataset.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - PG-RCNN: Semantic Surface Point Generation for 3D Object Detection [19.341260543105548]
Point Generation R-CNN (PG-RCNN) is a novel end-to-end detector for 3D object detection.
Uses a jointly trained RoI point generation module to process contextual information of RoIs.
For every generated point, PG-RCNN assigns a semantic feature that indicates the estimated foreground probability.
arXiv Detail & Related papers (2023-07-24T09:22:09Z) - R2Det: Redemption from Range-view for Accurate 3D Object Detection [16.855672228478074]
Redemption from Range-view Module (R2M) is a plug-and-play approach for 3D surface texture enhancement from the 2D range view to the 3D point view.
R2M can be seamlessly integrated into state-of-the-art LiDAR-based 3D object detectors as preprocessing.
arXiv Detail & Related papers (2023-07-21T10:36:05Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection [11.962073589763676]
Existing 3D detectors significantly improve the accuracy by adopting a two-stage paradigm.
The sparsity of point clouds, especially for the points far away, makes it difficult for the LiDAR-only refinement module to accurately recognize and locate objects.
We propose a novel multi-modality two-stage approach named FusionRCNN, which effectively and efficiently fuses point clouds and camera images in the Regions of Interest(RoI)
FusionRCNN significantly improves the strong SECOND baseline by 6.14% mAP on baseline, and outperforms competing two-stage approaches.
arXiv Detail & Related papers (2022-09-22T02:07:25Z) - Graph R-CNN: Towards Accurate 3D Object Detection with
Semantic-Decorated Local Graph [26.226885108862735]
Two-stage detectors have gained much popularity in 3D object detection.
Most two-stage 3D detectors utilize grid points, voxel grids, or sampled keypoints for RoI feature extraction in the second stage.
This paper solves this problem in three aspects.
arXiv Detail & Related papers (2022-08-07T02:56:56Z) - DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection [83.18142309597984]
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving.
We develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods.
arXiv Detail & Related papers (2022-03-15T18:46:06Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - Scale-Equalizing Pyramid Convolution for Object Detection [22.516829622445062]
Feature pyramid has been an efficient method to extract features at different scales.
Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution.
Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules.
arXiv Detail & Related papers (2020-05-06T19:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.