PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
- URL: http://arxiv.org/abs/2405.02811v1
- Date: Sun, 5 May 2024 04:44:41 GMT
- Title: PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
- Authors: Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan,
- Abstract summary: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.
We propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection.
- Score: 36.04323550267339
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2.
Related papers
- V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer [75.2251801053839]
We present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD)
We propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels.
The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2023-05-11T07:37:15Z) - OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for
Multi-Camera 3D Object Detection [78.38062015443195]
OA-BEV is a network that can be plugged into the BEV-based 3D object detection framework.
Our method achieves consistent improvements over the BEV-based baselines in terms of both average precision and nuScenes detection score.
arXiv Detail & Related papers (2023-01-13T06:02:31Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object
Detection [22.6659359032306]
This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++.
Experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60$%$, 40.18$%$, 68.21$%$ 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.
arXiv Detail & Related papers (2022-08-29T08:14:00Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - 3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level.
A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.