PillarNet: Real-Time and High-Performance Pillar-based 3D Object
Detection
- URL: http://arxiv.org/abs/2205.07403v2
- Date: Thu, 19 May 2022 07:37:11 GMT
- Title: PillarNet: Real-Time and High-Performance Pillar-based 3D Object
Detection
- Authors: Guangsheng Shi, Ruifeng Li and Chao Ma
- Abstract summary: Real-time and high-performance 3D object detection is of critical importance for autonomous driving.
Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions.
We develop a real-time and high-performance pillar-based detector, dubbed PillarNet.
- Score: 4.169126928311421
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Real-time and high-performance 3D object detection is of critical importance
for autonomous driving. Recent top-performing 3D object detectors mainly rely
on point-based or 3D voxel-based convolutions, which are both computationally
inefficient for onboard deployment. While recent researches focus on
point-based or 3D voxel-based convolutions for higher performance, these
methods fail to meet latency and power efficiency requirements especially for
deployment on embedded devices. In contrast, pillar-based methods use merely 2D
convolutions, which consume less computation resources, but they lag far behind
their voxel-based counterparts in detection accuracy. However, the superiority
of such 3D voxel-based methods over pillar-based methods is still broadly
attributed to the effectiveness of 3D convolution neural network (CNN). In this
paper, by examining the primary performance gap between pillar- and voxel-based
detectors, we develop a real-time and high-performance pillar-based detector,
dubbed PillarNet. The proposed PillarNet consists of a powerful encoder network
for effective pillar feature learning, a neck network for spatial-semantic
feature fusion and the commonly used detect head. Using only 2D convolutions,
PillarNet is flexible to an optional pillar size and compatible with classical
2D CNN backbones, such as VGGNet and ResNet. Additionally, PillarNet benefits
from our designed orientation-decoupled IoU regression loss along with the
IoU-aware prediction branch. Extensive experimental results on large-scale
nuScenes Dataset and Waymo Open Dataset demonstrate that the proposed PillarNet
performs well over the state-of-the-art 3D detectors in terms of effectiveness
and efficiency. Code will be made publicly available.
Related papers
- PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based
3D Object Detection [33.00510927880774]
We show the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors.
Our proposed pillar-based detector, PillarNeSt, outperforms the existing 3D object detectors by a large margin on the nuScenes and Argoversev2 datasets.
arXiv Detail & Related papers (2023-11-29T16:11:33Z) - HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection
in Point Clouds [19.1921315424192]
3D object detection in point clouds is important for autonomous driving systems.
A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene.
We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection.
arXiv Detail & Related papers (2023-10-31T07:32:08Z) - PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR
Point Clouds [29.15589024703907]
In this paper, we revisit the local point aggregators from the perspective of allocating computational resources.
We find that the simplest pillar based models perform surprisingly well considering both accuracy and latency.
Our results challenge the common intuition that the detailed geometry modeling is essential to achieve high performance for 3D object detection.
arXiv Detail & Related papers (2023-05-08T17:59:14Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - CVFNet: Real-time 3D Object Detection by Learning Cross View Features [11.402076835949824]
We present a real-time view-based single stage 3D object detector, namely CVFNet.
We first propose a novel Point-Range feature fusion module that deeply integrates point and range view features in multiple stages.
Then, a special Slice Pillar is designed to well maintain the 3D geometry when transforming the obtained deep point-view features into bird's eye view.
arXiv Detail & Related papers (2022-03-13T06:23:18Z) - EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object
Detection [51.52496693690059]
Fast stereo based 3D object detectors lag far behind high-precision oriented methods in accuracy.
We argue that the main reason is the missing or poor 3D geometry feature representation in fast stereo based methods.
The proposed EGFN outperforms YOLOStsereo3D, the advanced fast method, by 5.16% on mAP$_3d$ at the cost of merely additional 12 ms.
arXiv Detail & Related papers (2021-11-28T05:25:36Z) - Improved Pillar with Fine-grained Feature for 3D Object Detection [23.348710029787068]
3D object detection with LiDAR point clouds plays an important role in autonomous driving perception module.
Existing point-based methods are challenging to reach the speed requirements because of too many raw points.
The 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution.
arXiv Detail & Related papers (2021-10-12T14:53:14Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection [99.16162624992424]
We devise a simple but effective voxel-based framework, named Voxel R-CNN.
By taking full advantage of voxel features in a two stage approach, our method achieves comparable detection accuracy with state-of-the-art point-based models.
Our results show that Voxel R-CNN delivers a higher detection accuracy while maintaining a realtime frame processing rate, emphi.e, at a speed of 25 FPS on an NVIDIA 2080 Ti GPU.
arXiv Detail & Related papers (2020-12-31T17:02:46Z) - Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid.
We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently.
We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.