Focal Sparse Convolutional Networks for 3D Object Detection
- URL: http://arxiv.org/abs/2204.12463v1
- Date: Tue, 26 Apr 2022 17:34:10 GMT
- Title: Focal Sparse Convolutional Networks for 3D Object Detection
- Authors: Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia
- Abstract summary: We introduce two new modules to enhance the capability of Sparse CNNs.
They are focal sparse convolution (Focals Conv) and its multi-modal variant of focal sparse convolution with fusion.
For the first time, we show that spatially learnable sparsity in sparse convolution is essential for sophisticated 3D object detection.
- Score: 121.45950754511021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-uniformed 3D sparse data, e.g., point clouds or voxels in different
spatial positions, make contribution to the task of 3D object detection in
different ways. Existing basic components in sparse convolutional networks
(Sparse CNNs) process all sparse data, regardless of regular or submanifold
sparse convolution. In this paper, we introduce two new modules to enhance the
capability of Sparse CNNs, both are based on making feature sparsity learnable
with position-wise importance prediction. They are focal sparse convolution
(Focals Conv) and its multi-modal variant of focal sparse convolution with
fusion, or Focals Conv-F for short. The new modules can readily substitute
their plain counterparts in existing Sparse CNNs and be jointly trained in an
end-to-end fashion. For the first time, we show that spatially learnable
sparsity in sparse convolution is essential for sophisticated 3D object
detection. Extensive experiments on the KITTI, nuScenes and Waymo benchmarks
validate the effectiveness of our approach. Without bells and whistles, our
results outperform all existing single-model entries on the nuScenes test
benchmark at the paper submission time. Code and models are at
https://github.com/dvlab-research/FocalsConv.
Related papers
- Spherical Frustum Sparse Convolution Network for LiDAR Point Cloud Semantic Segmentation [62.258256483231484]
LiDAR point cloud semantic segmentation enables the robots to obtain fine-grained semantic information of the surrounding environment.
Many works project the point cloud onto the 2D image and adopt the 2D Convolutional Neural Networks (CNNs) or vision transformer for LiDAR point cloud semantic segmentation.
In this paper, we propose a novel spherical frustum structure to avoid quantized information loss.
arXiv Detail & Related papers (2023-11-29T09:55:13Z) - SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor
3D Object Detection [84.09798649295038]
Given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is noisy and inefficient.
We propose SparseFusion, a novel multi-sensor 3D detection method that exclusively uses sparse candidates and sparse representations.
SparseFusion achieves state-of-the-art performance on the nuScenes benchmark while also running at the fastest speed, even outperforming methods with stronger backbones.
arXiv Detail & Related papers (2023-04-27T17:17:39Z) - A Closer Look at Few-Shot 3D Point Cloud Classification [21.57893885371941]
We propose a new network, Point-cloud Correlation Interaction ( PCIA), with three novel plug-and-play components called Salient-Part Fusion (SPF), Self-Channel Interaction Plus (SCI+) module, and Cross-Instance Fusion Plus (CIF+) module.
These modules can be inserted into most FSL algorithms with minor changes and significantly improve the performance.
Experimental results on three benchmark datasets, ModelNet40-FS, ShapeNet70-FS, and ScanObjectNN-FS, demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-31T17:01:13Z) - A Unified BEV Model for Joint Learning of 3D Local Features and Overlap
Estimation [12.499361832561634]
We present a unified bird's-eye view (BEV) model for jointly learning of 3D local features and overlap estimation.
Our method significantly outperforms existing methods on overlap prediction, especially in scenes with small overlaps.
arXiv Detail & Related papers (2023-02-28T12:01:16Z) - Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation [66.6890991207065]
Sparse 3D convolutions have become the de-facto tools to construct deep neural networks.
We propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions.
We show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception.
arXiv Detail & Related papers (2023-01-24T16:10:08Z) - Spatial Pruned Sparse Convolution for Efficient 3D Object Detection [41.62839541489369]
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
In this paper, we analyze major components of existing 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead.
We propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS
arXiv Detail & Related papers (2022-09-28T16:19:06Z) - SVNet: Where SO(3) Equivariance Meets Binarization on Point Cloud
Representation [65.4396959244269]
The paper tackles the challenge by designing a general framework to construct 3D learning architectures.
The proposed approach can be applied to general backbones like PointNet and DGCNN.
Experiments on ModelNet40, ShapeNet, and the real-world dataset ScanObjectNN, demonstrated that the method achieves a great trade-off between efficiency, rotation, and accuracy.
arXiv Detail & Related papers (2022-09-13T12:12:19Z) - The Devils in the Point Clouds: Studying the Robustness of Point Cloud
Convolutions [15.997907568429177]
This paper investigates different variants of PointConv, a convolution network on point clouds, to examine their robustness to input scale and rotation changes.
We derive a novel viewpoint-invariant descriptor by utilizing 3D geometric properties as the input to PointConv.
Experiments are conducted on the 2D MNIST & CIFAR-10 datasets as well as the 3D Semantic KITTI & ScanNet dataset.
arXiv Detail & Related papers (2021-01-19T19:32:38Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z) - D3Feat: Joint Learning of Dense Detection and Description of 3D Local
Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds.
We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point.
Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.