SWFormer: Sparse Window Transformer for 3D Object Detection in Point
Clouds
- URL: http://arxiv.org/abs/2210.07372v1
- Date: Thu, 13 Oct 2022 21:37:53 GMT
- Title: SWFormer: Sparse Window Transformer for 3D Object Detection in Point
Clouds
- Authors: Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng,
and Dragomir Anguelov
- Abstract summary: 3D object detection in point clouds is a core component for modern robotics and autonomous driving systems.
Key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene.
We propose Sparse Window Transformer (SWFormer), a scalable and accurate model for 3D object detection.
- Score: 44.635939022626744
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: 3D object detection in point clouds is a core component for modern robotics
and autonomous driving systems. A key challenge in 3D object detection comes
from the inherent sparse nature of point occupancy within the 3D scene. In this
paper, we propose Sparse Window Transformer (SWFormer ), a scalable and
accurate model for 3D object detection, which can take full advantage of the
sparsity of point clouds. Built upon the idea of window-based Transformers,
SWFormer converts 3D points into sparse voxels and windows, and then processes
these variable-length sparse windows efficiently using a bucketing scheme. In
addition to self-attention within each spatial window, our SWFormer also
captures cross-window correlation with multi-scale feature fusion and window
shifting operations. To further address the unique challenge of detecting 3D
objects accurately from sparse features, we propose a new voxel diffusion
technique. Experimental results on the Waymo Open Dataset show our SWFormer
achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object
detection on the official test set, outperforming all previous single-stage and
two-stage models, while being much more efficient.
Related papers
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Single-Shot 3D Detection of Vehicles from Monocular RGB Images via
Geometry Constrained Keypoints in Real-Time [6.82446891805815]
We propose a novel 3D single-shot object detection method for detecting vehicles in monocular RGB images.
Our approach lifts 2D detections to 3D space by predicting additional regression and classification parameters.
We test our approach on different datasets for autonomous driving and evaluate it using the challenging KITTI 3D Object Detection and the novel nuScenes Object Detection benchmarks.
arXiv Detail & Related papers (2020-06-23T15:10:19Z) - Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection [40.34710686994996]
3D object detection has become an emerging task in autonomous driving scenarios.
Previous works process 3D point clouds using either projection-based or voxel-based models.
We propose the Stereo RGB and Deeper LIDAR framework which can utilize semantic and spatial information simultaneously.
arXiv Detail & Related papers (2020-06-09T11:19:24Z) - DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes [54.239416488865565]
We propose a fast single-stage 3D object detection method for LIDAR data.
The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes.
We find that our proposed method achieves state-of-the-art results by 5% on object detection in ScanNet scenes, and it gets top results by 3.4% in the Open dataset.
arXiv Detail & Related papers (2020-04-02T17:48:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.