Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection
- URL: http://arxiv.org/abs/2307.08209v2
- Date: Wed, 9 Aug 2023 03:18:27 GMT
- Title: Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection
- Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao,
Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang
- Abstract summary: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving.
Their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles.
We propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy.
- Score: 19.321076175294902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object
detection in autonomous driving. However, their significant computational and
memory costs pose a challenge for their application to resource-constrained
vehicles. One reason for this high resource consumption is the presence of a
large number of redundant background points in Lidar point clouds, resulting in
spatial redundancy in both 3D voxel and dense BEV map representations. To
address this issue, we propose an adaptive inference framework called Ada3D,
which focuses on exploiting the input-level spatial redundancy. Ada3D
adaptively filters the redundant input, guided by a lightweight importance
predictor and the unique properties of the Lidar point cloud. Additionally, we
utilize the BEV features' intrinsic sparsity by introducing the Sparsity
Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D
voxels and decrease the density of 2D BEV feature maps from 100% to 20% without
sacrificing accuracy. Ada3D reduces the model computational and memory cost by
5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak
memory optimization for the 3D and 2D backbone respectively.
Related papers
- DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction [15.331332063879342]
We propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing.
SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline.
It also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels.
arXiv Detail & Related papers (2024-04-15T06:45:06Z) - Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving [26.03800936700545]
We propose to regulate intermediate dense 3D features with the help of volume rendering.
Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features.
arXiv Detail & Related papers (2023-12-19T04:09:05Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - Spatial Pruned Sparse Convolution for Efficient 3D Object Detection [41.62839541489369]
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we analyze major components of existing 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead.
We propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS
arXiv Detail & Related papers (2022-09-28T16:19:06Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - 3D-FFS: Faster 3D object detection with Focused Frustum Search in sensor
fusion based networks [0.0]
We propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster.
3D-FFS can substantially constrain the 3D search space and thereby significantly reduce training time, inference time and memory consumption.
Compared to F-ConvNet, we achieve improvements in training and inference times by up to 62.84% and 56.46%, respectively, while reducing the memory usage by up to 58.53%.
arXiv Detail & Related papers (2021-03-15T11:32:21Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.