LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
- URL: http://arxiv.org/abs/2206.10555v2
- Date: Wed, 22 Mar 2023 12:43:10 GMT
- Title: LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
- Authors: Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
- Abstract summary: We propose the spatial-wise partition convolution and its large- Kernel module.
Our large- kernel 3D CNN network, Large Kernel3D, yields notable improvement in 3D tasks.
For the first time, we show that large kernels are feasible and essential for 3D visual tasks.
- Score: 78.25819070166351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advance in 2D CNNs has revealed that large kernels are important.
However, when directly applying large convolutional kernels in 3D CNNs, severe
difficulties are met, where those successful module designs in 2D become
surprisingly ineffective on 3D networks, including the popular depth-wise
convolution. To address this vital challenge, we instead propose the
spatial-wise partition convolution and its large-kernel module. As a result, it
avoids the optimization and efficiency issues of naive 3D large kernels. Our
large-kernel 3D CNN network, LargeKernel3D, yields notable improvement in 3D
tasks of semantic segmentation and object detection. It achieves 73.9% mIoU on
the ScanNetv2 semantic segmentation and 72.8% NDS nuScenes object detection
benchmarks, ranking 1st on the nuScenes LIDAR leaderboard. The performance
further boosts to 74.2% NDS with a simple multi-modal fusion. In addition,
LargeKernel3D can be scaled to 17x17x17 kernel size on Waymo 3D object
detection. For the first time, we show that large kernels are feasible and
essential for 3D visual tasks.
Related papers
- LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels [62.31333169413391]
Large Sparse Kernel 3D Neural Network (LSK3DNet)
Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS)
arXiv Detail & Related papers (2024-03-22T12:54:33Z) - LinK: Linear Kernel for LiDAR-based 3D Perception [48.75602569945194]
We propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs.
The proposed method successfully enables each voxel to perceive context within a range of 21x21x21.
arXiv Detail & Related papers (2023-03-28T16:02:30Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - To the Point: Efficient 3D Object Detection in the Range Image with
Graph Convolution Kernels [30.3378171262436]
We design a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network.
Our method performs competitively on the Open dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%.
It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.
arXiv Detail & Related papers (2021-06-25T01:27:26Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.