SPADE: Sparse Pillar-based 3D Object Detection Accelerator for
Autonomous Driving
- URL: http://arxiv.org/abs/2305.07522v3
- Date: Sat, 13 Jan 2024 09:13:25 GMT
- Title: SPADE: Sparse Pillar-based 3D Object Detection Accelerator for
Autonomous Driving
- Authors: Minjae Lee, Seongmin Park, Hyungmin Kim, Minyong Yoon, Janghwan Lee,
Jun Won Choi, Nam Sung Kim, Mingu Kang, Jungwook Choi
- Abstract summary: 3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving.
PointPillars, a widely adopted bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars for fast and accurate 3D object detection.
We propose SPADE, an algorithm-hardware co-design strategy to maximize vector sparsity in pillar-based 3D object detection and accelerate vector-sparse convolution.
- Score: 18.745798346661097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D object detection using point cloud (PC) data is essential for perception
pipelines of autonomous driving, where efficient encoding is key to meeting
stringent resource and latency requirements. PointPillars, a widely adopted
bird's-eye view (BEV) encoding, aggregates 3D point cloud data into 2D pillars
for fast and accurate 3D object detection. However, the state-of-the-art
methods employing PointPillars overlook the inherent sparsity of pillar
encoding where only a valid pillar is encoded with a vector of channel
elements, missing opportunities for significant computational reduction.
Meanwhile, current sparse convolution accelerators are designed to handle only
element-wise activation sparsity and do not effectively address the vector
sparsity imposed by pillar encoding.
In this paper, we propose SPADE, an algorithm-hardware co-design strategy to
maximize vector sparsity in pillar-based 3D object detection and accelerate
vector-sparse convolution commensurate with the improved sparsity. SPADE
consists of three components: (1) a dynamic vector pruning algorithm balancing
accuracy and computation savings from vector sparsity, (2) a sparse coordinate
management hardware transforming 2D systolic array into a vector-sparse
convolution accelerator, and (3) sparsity-aware dataflow optimization tailoring
sparse convolution schedules for hardware efficiency. Taped-out with a
commercial technology, SPADE saves the amount of computation by 36.3--89.2\%
for representative 3D object detection networks and benchmarks, leading to
1.3--10.9$\times$ speedup and 1.5--12.6$\times$ energy savings compared to the
ideal dense accelerator design. These sparsity-proportional performance gains
equate to 4.1--28.8$\times$ speedup and 90.2--372.3$\times$ energy savings
compared to the counterpart server and edge platforms.
Related papers
- Optimized CNNs for Rapid 3D Point Cloud Object Recognition [2.6462438855724826]
This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs)
Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that capitalize on the typical sparsity observed in input data.
The Vote3Deep models, with just three layers, outperform the previous state-of-the-art in both laser-only approaches and combined laser-vision methods.
arXiv Detail & Related papers (2024-12-03T21:42:30Z) - Selectively Dilated Convolution for Accuracy-Preserving Sparse Pillar-based Embedded 3D Object Detection [15.661833433778147]
Dense pillar processing wastes computation since it ignores the inherent sparsity of pillars derived from scattered point cloud data.
We propose a selectively dilated (SD-Conv) convolution that evaluates the importance of encoded pillars and selectively dilates the convolution output.
This design supports the SD-Conv without significant demands in area and size, realizing superior trade-off between the speedup and model accuracy.
arXiv Detail & Related papers (2024-08-25T10:14:43Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection [19.321076175294902]
Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving.
Their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles.
We propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy.
arXiv Detail & Related papers (2023-07-17T02:58:51Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference.
FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z) - PillarNet: Real-Time and High-Performance Pillar-based 3D Object
Detection [4.169126928311421]
Real-time and high-performance 3D object detection is of critical importance for autonomous driving.
Recent top-performing 3D object detectors mainly rely on point-based or 3D voxel-based convolutions.
We develop a real-time and high-performance pillar-based detector, dubbed PillarNet.
arXiv Detail & Related papers (2022-05-16T00:14:50Z) - TorchSparse: Efficient Point Cloud Inference Engine [24.541195361633523]
We introduce TorchSparse, a high-performance point cloud inference engine.
TorchSparse directly optimize the two bottlenecks of sparse convolution: irregular computation and data movement.
It achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.
arXiv Detail & Related papers (2022-04-21T17:58:30Z) - SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z) - Improved Pillar with Fine-grained Feature for 3D Object Detection [23.348710029787068]
3D object detection with LiDAR point clouds plays an important role in autonomous driving perception module.
Existing point-based methods are challenging to reach the speed requirements because of too many raw points.
The 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution.
arXiv Detail & Related papers (2021-10-12T14:53:14Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.