Related papers: LinK: Linear Kernel for LiDAR-based 3D Perception

LinK: Linear Kernel for LiDAR-based 3D Perception

URL: http://arxiv.org/abs/2303.16094v1
Date: Tue, 28 Mar 2023 16:02:30 GMT
Title: LinK: Linear Kernel for LiDAR-based 3D Perception
Authors: Tao Lu, Xiang Ding, Haisong Liu, Gangshan Wu, Limin Wang
Abstract summary: We propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21.
Score: 48.75602569945194
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.

Related papers

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift. We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information. We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z)
Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision. densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive. Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z)
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels [62.31333169413391]
Large Sparse Kernel 3D Neural Network (LSK3DNet) Our method comprises two core components: Spatial-wise Dynamic Sparsity (SDS) and Channel-wise Weight Selection (CWS)
arXiv Detail & Related papers (2024-03-22T12:54:33Z)
LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs [78.25819070166351]
We propose the spatial-wise partition convolution and its large- Kernel module. Our large- kernel 3D CNN network, Large Kernel3D, yields notable improvement in 3D tasks. For the first time, we show that large kernels are feasible and essential for 3D visual tasks.
arXiv Detail & Related papers (2022-06-21T17:35:57Z)
Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases. Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds. We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z)
Improved Pillar with Fine-grained Feature for 3D Object Detection [23.348710029787068]
3D object detection with LiDAR point clouds plays an important role in autonomous driving perception module. Existing point-based methods are challenging to reach the speed requirements because of too many raw points. The 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution.
arXiv Detail & Related papers (2021-10-12T14:53:14Z)
Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions. We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z)
Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors. We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom. We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z)
Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image. Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space. We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step. This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.