Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection
- URL: http://arxiv.org/abs/2307.08209v2
- Date: Wed, 9 Aug 2023 03:18:27 GMT
- Title: Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for
Efficient 3D Object Detection
- Authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao,
Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang
- Abstract summary: Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving.
Their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles.
We propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy.
- Score: 19.321076175294902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voxel-based methods have achieved state-of-the-art performance for 3D object
detection in autonomous driving. However, their significant computational and
memory costs pose a challenge for their application to resource-constrained
vehicles. One reason for this high resource consumption is the presence of a
large number of redundant background points in Lidar point clouds, resulting in
spatial redundancy in both 3D voxel and dense BEV map representations. To
address this issue, we propose an adaptive inference framework called Ada3D,
which focuses on exploiting the input-level spatial redundancy. Ada3D
adaptively filters the redundant input, guided by a lightweight importance
predictor and the unique properties of the Lidar point cloud. Additionally, we
utilize the BEV features' intrinsic sparsity by introducing the Sparsity
Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D
voxels and decrease the density of 2D BEV feature maps from 100% to 20% without
sacrificing accuracy. Ada3D reduces the model computational and memory cost by
5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak
memory optimization for the 3D and 2D backbone respectively.
Related papers
- Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation [32.50849425431012]
For autonomous cars equipped with multi-camera and LiDAR, it is critical to aggregate multi-sensor information into a unified 3D space for accurate and robust predictions.
Recent methods are mainly built on the 2D-to-3D transformation that relies on sensor calibration to project the 2D image information into the 3D space.
In this work, we propose a calibration-free spatial transformation based on vanilla attention to implicitly model the spatial correspondence.
arXiv Detail & Related papers (2024-11-19T02:40:42Z) - 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection [12.14595005884025]
This paper introduces 3DGS into 3DOD for the first time, identifying two main challenges.
We propose an elegant and efficient solution by incorporating 2D Boundary Guidance.
We also propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces.
arXiv Detail & Related papers (2024-10-02T15:15:52Z) - DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction [15.331332063879342]
We propose SparseOcc, an efficient occupancy network inspired by sparse point cloud processing.
SparseOcc achieves a remarkable 74.9% reduction on FLOPs over the dense baseline.
It also improves accuracy, from 12.8% to 14.1% mIOU, which in part can be attributed to the sparse representation's ability to avoid hallucinations on empty voxels.
arXiv Detail & Related papers (2024-04-15T06:45:06Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation [105.96557764248846]
We introduce BEVFusion, a generic multi-task multi-sensor fusion framework.
It unifies multi-modal features in the shared bird's-eye view representation space.
It achieves 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower cost.
arXiv Detail & Related papers (2022-05-26T17:59:35Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - 3D-FFS: Faster 3D object detection with Focused Frustum Search in sensor
fusion based networks [0.0]
We propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster.
3D-FFS can substantially constrain the 3D search space and thereby significantly reduce training time, inference time and memory consumption.
Compared to F-ConvNet, we achieve improvements in training and inference times by up to 62.84% and 56.46%, respectively, while reducing the memory usage by up to 58.53%.
arXiv Detail & Related papers (2021-03-15T11:32:21Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.