Related papers: Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

URL: http://arxiv.org/abs/2210.05666v2
Date: Wed, 12 Oct 2022 17:44:57 GMT
Title: Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
Authors: Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao
Abstract summary: We analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks.
Score: 25.245254516317118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.

Related papers

PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection [36.04323550267339]
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. We propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection.
arXiv Detail & Related papers (2024-05-05T04:44:41Z)
Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors. First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning. Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z)
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT. PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art. We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance. To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information. Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z)
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification. Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z)
Deep Point Cloud Reconstruction [74.694733918351]
Point cloud obtained from 3D scanning is often sparse, noisy, and irregular. To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud. We propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points.
arXiv Detail & Related papers (2021-11-23T07:53:28Z)
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds. Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z)
The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions [15.997907568429177]
This paper investigates different variants of PointConv, a convolution network on point clouds, to examine their robustness to input scale and rotation changes. We derive a novel viewpoint-invariant descriptor by utilizing 3D geometric properties as the input to PointConv. Experiments are conducted on the 2D MNIST & CIFAR-10 datasets as well as the 3D Semantic KITTI & ScanNet dataset.
arXiv Detail & Related papers (2021-01-19T19:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.