Spherical Transformer for LiDAR-based 3D Recognition
- URL: http://arxiv.org/abs/2303.12766v1
- Date: Wed, 22 Mar 2023 17:30:14 GMT
- Title: Spherical Transformer for LiDAR-based 3D Recognition
- Authors: Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia
- Abstract summary: We study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to sparse distant ones.
We design radial window self-attention that partitions the space into multiple non-overlapping narrow and long windows.
To fit the narrow and long windows, we propose exponential splitting to yield fine-grained position encoding and dynamic feature selection.
- Score: 48.44153945515335
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: LiDAR-based 3D point cloud recognition has benefited various applications.
Without specially considering the LiDAR point distribution, most current
methods suffer from information disconnection and limited receptive field,
especially for the sparse distant points. In this work, we study the
varying-sparsity distribution of LiDAR points and present SphereFormer to
directly aggregate information from dense close points to the sparse distant
ones. We design radial window self-attention that partitions the space into
multiple non-overlapping narrow and long windows. It overcomes the
disconnection issue and enlarges the receptive field smoothly and dramatically,
which significantly boosts the performance of sparse distant points. Moreover,
to fit the narrow and long windows, we propose exponential splitting to yield
fine-grained position encoding and dynamic feature selection to increase model
representation ability. Notably, our method ranks 1st on both nuScenes and
SemanticKITTI semantic segmentation benchmarks with 81.9% and 74.8% mIoU,
respectively. Also, we achieve the 3rd place on nuScenes object detection
benchmark with 72.8% NDS and 68.5% mAP. Code is available at
https://github.com/dvlab-research/SphereFormer.git.
Related papers
- HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection
in Point Clouds [19.1921315424192]
3D object detection in point clouds is important for autonomous driving systems.
A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene.
We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection.
arXiv Detail & Related papers (2023-10-31T07:32:08Z) - CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition [45.16530801796705]
CrossLoc3D is a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting.
We present CS-Campus3D, the first 3D aerial-ground cross-source dataset consisting of point cloud data from both aerial and ground LiDAR scans.
arXiv Detail & Related papers (2023-03-31T02:50:52Z) - Super Sparse 3D Object Detection [48.684300007948906]
LiDAR-based 3D object detection contributes ever-increasingly to the long-range perception in autonomous driving.
To enable efficient long-range detection, we first propose a fully sparse object detector termed FSD.
FSD++ generates residual points, which indicate the point changes between consecutive frames.
arXiv Detail & Related papers (2023-01-05T17:03:56Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - POCO: Point Convolution for Surface Reconstruction [92.22371813519003]
Implicit neural networks have been successfully used for surface reconstruction from point clouds.
Many of them face scalability issues as they encode the isosurface function of a whole object or scene into a single latent vector.
We propose to use point cloud convolutions and compute latent vectors at each input point.
arXiv Detail & Related papers (2022-01-05T21:26:18Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - TransLoc3D : Point Cloud based Large-scale Place Recognition using
Adaptive Receptive Fields [40.55971834919629]
We argue that fixed receptive fields are not well suited for place recognition.
We propose a novel Adaptive Receptive Field Module (ARFM), which can adaptively adjust the size of the receptive field based on the input point cloud.
We also present a novel network architecture, named TransLoc3D, to obtain discriminative global descriptors of point clouds.
arXiv Detail & Related papers (2021-05-25T01:54:31Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR
Segmentation [81.02742110604161]
State-of-the-art methods for large-scale driving-scene LiDAR segmentation often project the point clouds to 2D space and then process them via 2D convolution.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pat-tern.
Our method achieves the 1st place in the leaderboard of Semantic KITTI and outperforms existing methods on nuScenes with a noticeable margin, about 4%.
arXiv Detail & Related papers (2020-11-19T18:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.