PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
- URL: http://arxiv.org/abs/2305.06621v1
- Date: Thu, 11 May 2023 07:37:15 GMT
- Title: PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
- Authors: Honghui Yang and Wenxiao Wang and Minghao Chen and Binbin Lin and Tong
He and Hua Chen and Xiaofei He and Wanli Ouyang
- Abstract summary: We present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD)
We propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels.
The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method.
- Score: 75.2251801053839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent Transformer-based 3D object detectors learn point cloud features
either from point- or voxel-based representations. However, the former requires
time-consuming sampling while the latter introduces quantization errors. In
this paper, we present a novel Point-Voxel Transformer for single-stage 3D
detection (PVT-SSD) that takes advantage of these two representations.
Specifically, we first use voxel-based sparse convolutions for efficient
feature encoding. Then, we propose a Point-Voxel Transformer (PVT) module that
obtains long-range contexts in a cheap manner from voxels while attaining
accurate positions from points. The key to associating the two different
representations is our introduced input-dependent Query Initialization module,
which could efficiently generate reference points and content queries. Then,
PVT adaptively fuses long-range contextual and local geometric information
around reference points into content queries. Further, to quickly find the
neighboring points of reference points, we design the Virtual Range Image
module, which generalizes the native range image to multi-sensor and
multi-frame. The experiments on several autonomous driving benchmarks verify
the effectiveness and efficiency of the proposed method. Code will be available
at https://github.com/Nightmare-n/PVT-SSD.
Related papers
- PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection [36.04323550267339]
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.
We propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection.
arXiv Detail & Related papers (2024-05-05T04:44:41Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object
Detection [22.6659359032306]
This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++.
Experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60$%$, 40.18$%$, 68.21$%$ 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.
arXiv Detail & Related papers (2022-08-29T08:14:00Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from
Point Clouds [16.69887974230884]
Transformer has demonstrated promising performance in many 2D vision tasks.
It is cumbersome to compute the self-attention on large-scale point cloud data because point cloud is a long sequence and unevenly distributed in 3D space.
Existing methods usually compute self-attention locally by grouping the points into clusters of the same size, or perform convolutional self-attention on a discretized representation.
We propose a novel voxel-based architecture, namely Voxel Set Transformer (VoxSeT), to detect 3D objects from point clouds by means of set-to-set translation.
arXiv Detail & Related papers (2022-03-19T12:31:46Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Voxel Transformer for 3D Object Detection [133.34678177431914]
Voxel Transformer (VoTr) is a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.
Our proposed VoTr shows consistent improvement over the convolutional baselines while maintaining computational efficiency on the KITTI dataset and the Open dataset.
arXiv Detail & Related papers (2021-09-06T14:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.