Point Transformer V2: Grouped Vector Attention and Partition-based
Pooling
- URL: http://arxiv.org/abs/2210.05666v2
- Date: Wed, 12 Oct 2022 17:44:57 GMT
- Title: Point Transformer V2: Grouped Vector Attention and Partition-based
Pooling
- Authors: Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao
- Abstract summary: We analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model.
In particular, we first propose group vector attention, which is more effective than the previous version of vector attention.
Our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks.
- Score: 25.245254516317118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a pioneering work exploring transformer architecture for 3D point cloud
understanding, Point Transformer achieves impressive results on multiple highly
competitive benchmarks. In this work, we analyze the limitations of the Point
Transformer and propose our powerful and efficient Point Transformer V2 model
with novel designs that overcome the limitations of previous work. In
particular, we first propose group vector attention, which is more effective
than the previous version of vector attention. Inheriting the advantages of
both learnable weight encoding and multi-head attention, we present a highly
effective implementation of grouped vector attention with a novel grouped
weight encoding layer. We also strengthen the position information for
attention by an additional position encoding multiplier. Furthermore, we design
novel and lightweight partition-based pooling methods which enable better
spatial alignment and more efficient sampling. Extensive experiments show that
our model achieves better performance than its predecessor and achieves
state-of-the-art on several challenging 3D point cloud understanding
benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and
3D point cloud classification on ModelNet40. Our code will be available at
https://github.com/Gofinge/PointTransformerV2.
Related papers
- PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection [36.04323550267339]
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.
We propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection.
arXiv Detail & Related papers (2024-05-05T04:44:41Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT.
PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art.
We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z) - Deep Point Cloud Reconstruction [74.694733918351]
Point cloud obtained from 3D scanning is often sparse, noisy, and irregular.
To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud.
We propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points.
arXiv Detail & Related papers (2021-11-23T07:53:28Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - The Devils in the Point Clouds: Studying the Robustness of Point Cloud
Convolutions [15.997907568429177]
This paper investigates different variants of PointConv, a convolution network on point clouds, to examine their robustness to input scale and rotation changes.
We derive a novel viewpoint-invariant descriptor by utilizing 3D geometric properties as the input to PointConv.
Experiments are conducted on the 2D MNIST & CIFAR-10 datasets as well as the 3D Semantic KITTI & ScanNet dataset.
arXiv Detail & Related papers (2021-01-19T19:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.