LinK: Linear Kernel for LiDAR-based 3D Perception
- URL: http://arxiv.org/abs/2303.16094v1
- Date: Tue, 28 Mar 2023 16:02:30 GMT
- Title: LinK: Linear Kernel for LiDAR-based 3D Perception
- Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
- Abstract summary: We propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs.
The proposed method successfully enables each voxel to perceive context within a range of 21x21x21.
- Score: 48.75602569945194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Extending the success of 2D Large Kernel to 3D perception is challenging due
to: 1. the cubically-increasing overhead in processing 3D data; 2. the
optimization difficulties from data scarcity and sparsity. Previous work has
taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by
introducing block-shared weights. However, to reduce the feature variations
within a block, it only employs modest block size and fails to achieve larger
kernels like the 21x21x21. To address this issue, we propose a new method,
called LinK, to achieve a wider-range perception receptive field in a
convolution-like manner with two core designs. The first is to replace the
static kernel matrix with a linear kernel generator, which adaptively provides
weights only for non-empty voxels. The second is to reuse the pre-computed
aggregation results in the overlapped blocks to reduce computation complexity.
The proposed method successfully enables each voxel to perceive context within
a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D
object detection and 3D semantic segmentation, demonstrate the effectiveness of
our method. Notably, we rank 1st on the public leaderboard of the 3D detection
benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based
backbone into the basic detector, CenterPoint. We also boost the strong
segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is
available at https://github.com/MCG-NJU/LinK.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels [62.31333169413391]
Large Sparse Kernel 3D Neural Network (LSK3DNet)
Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS)
arXiv Detail & Related papers (2024-03-22T12:54:33Z) - LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs [78.25819070166351]
We propose the spatial-wise partition convolution and its large- Kernel module.
Our large- kernel 3D CNN network, Large Kernel3D, yields notable improvement in 3D tasks.
For the first time, we show that large kernels are feasible and essential for 3D visual tasks.
arXiv Detail & Related papers (2022-06-21T17:35:57Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Improved Pillar with Fine-grained Feature for 3D Object Detection [23.348710029787068]
3D object detection with LiDAR point clouds plays an important role in autonomous driving perception module.
Existing point-based methods are challenging to reach the speed requirements because of too many raw points.
The 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution.
arXiv Detail & Related papers (2021-10-12T14:53:14Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.