LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
- URL: http://arxiv.org/abs/2403.15173v1
- Date: Fri, 22 Mar 2024 12:54:33 GMT
- Title: LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
- Authors: Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang,
- Abstract summary: Large Sparse Kernel 3D Neural Network (LSK3DNet)
Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS)
- Score: 62.31333169413391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous systems need to process large-scale, sparse, and irregular point clouds with limited compute resources. Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective. Although naively enlarging 3D kernel size can enhance performance, it will also lead to a cubically-increasing overhead. Therefore, it is crucial to develop streamlined 3D large kernel designs that eliminate redundant weights and work effectively with larger kernels. In this paper, we propose an efficient and effective Large Sparse Kernel 3D Neural Network (LSK3DNet) that leverages dynamic pruning to amplify the 3D kernel size. Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS). SDS dynamically prunes and regrows volumetric weights from the beginning to learn a large sparse 3D kernel. It not only boosts performance but also significantly reduces model size and computational cost. Moreover, CWS selects the most important channels for 3D convolution during training and subsequently prunes the redundant channels to accelerate inference for 3D vision tasks. We demonstrate the effectiveness of LSK3DNet on three benchmark datasets and five tracks compared with classical models and large kernel designs. Notably, LSK3DNet achieves the state-of-the-art performance on SemanticKITTI (i.e., 75.6% on single-scan and 63.4% on multi-scan), with roughly 40% model size reduction and 60% computing operations reduction compared to the naive large 3D kernel model.
Related papers
- E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D
Medical Image Segmentation [36.367368163120794]
We propose a 3D medical image segmentation model, named Efficient to Efficient Network (E2ENet)
It incorporates two parametrically and computationally efficient designs.
It consistently achieves a superior trade-off between accuracy and efficiency across various resource constraints.
arXiv Detail & Related papers (2023-12-07T22:13:37Z) - LinK: Linear Kernel for LiDAR-based 3D Perception [48.75602569945194]
We propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs.
The proposed method successfully enables each voxel to perceive context within a range of 21x21x21.
arXiv Detail & Related papers (2023-03-28T16:02:30Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - LKD-Net: Large Kernel Convolution Network for Single Image Dehazing [70.46392287128307]
We propose a novel Large Kernel Convolution Dehaze Block (LKD Block) consisting of the Decomposition deep-wise Large Kernel Convolution Block (DLKCB) and the Channel Enhanced Feed-forward Network (CEFN)
The designed DLKCB can split the deep-wise large kernel convolution into a smaller depth-wise convolution and a depth-wise dilated convolution without introducing massive parameters and computational overhead.
Our LKD-Net dramatically outperforms the Transformer-based method Dehamer with only 1.79% #Param and 48.9% FLOPs.
arXiv Detail & Related papers (2022-09-05T06:56:48Z) - LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs [78.25819070166351]
We propose the spatial-wise partition convolution and its large- Kernel module.
Our large- kernel 3D CNN network, Large Kernel3D, yields notable improvement in 3D tasks.
For the first time, we show that large kernels are feasible and essential for 3D visual tasks.
arXiv Detail & Related papers (2022-06-21T17:35:57Z) - DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware
Efficiency of Compact Neural Networks [29.46621102184345]
We propose a framework dubbed DepthShrinker to develop hardware-friendly compact networks.
Our framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques.
arXiv Detail & Related papers (2022-06-02T02:32:47Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Efficient embedding network for 3D brain tumor segmentation [0.33727511459109777]
In this paper, we investigate a way to transfer the performance of a two-dimensional classiffication network for the purpose of three-dimensional semantic segmentation of brain tumors.
As the input data is in 3D, the first layers of the encoder are devoted to the reduction of the third dimension in order to fit the input of the EfficientNet network.
Experimental results on validation and test data from the BraTS 2020 challenge demonstrate that the proposed method achieve promising performance.
arXiv Detail & Related papers (2020-11-22T16:17:29Z) - EDNet: Efficient Disparity Estimation with Cost Volume Combination and
Attention-based Spatial Residual [17.638034176859932]
Existing disparity estimation works mostly leverage the 4D concatenation volume and construct a very deep 3D convolution neural network (CNN) for disparity regression.
In this paper, we propose a network named EDNet for efficient disparity estimation.
Experiments on the Scene Flow and KITTI datasets show that EDNet outperforms the previous 3D CNN based works.
arXiv Detail & Related papers (2020-10-26T04:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.