Point Transformer
- URL: http://arxiv.org/abs/2011.00931v2
- Date: Thu, 14 Oct 2021 10:51:39 GMT
- Title: Point Transformer
- Authors: Nico Engel, Vasileios Belagiannis and Klaus Dietmayer
- Abstract summary: Point Transformer is a deep neural network that operates on unordered and unstructured point sets.
We introduce the local-global attention mechanism, which aims to capture spatial point relations and shape information.
The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into computer vision applications.
- Score: 15.312334863052968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present Point Transformer, a deep neural network that
operates directly on unordered and unstructured point sets. We design Point
Transformer to extract local and global features and relate both
representations by introducing the local-global attention mechanism, which aims
to capture spatial point relations and shape information. For that purpose, we
propose SortNet, as part of the Point Transformer, which induces input
permutation invariance by selecting points based on a learned score. The output
of Point Transformer is a sorted and permutation invariant feature list that
can directly be incorporated into common computer vision applications. We
evaluate our approach on standard classification and part segmentation
benchmarks to demonstrate competitive results compared to the prior work. Code
is publicly available at: https://github.com/engelnico/point-transformer
Related papers
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer [75.2251801053839]
We present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD)
We propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels.
The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2023-05-11T07:37:15Z) - Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation [22.587913528540465]
In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations.
Local feature learning is performed through Relative Position, Attentive Feature Pooling.
We demonstrate its superiority experimentally on classification and segmentation tasks.
arXiv Detail & Related papers (2023-04-27T12:17:35Z) - Self-positioning Point-based Transformer for Point Cloud Understanding [18.394318824968263]
Self-Positioning point-based Transformer (SPoTr) is designed to capture both local and global shape contexts with reduced complexity.
SPoTr achieves an accuracy gain of 2.6% over the previous best models on shape classification with ScanObjectNN.
arXiv Detail & Related papers (2023-03-29T04:27:11Z) - Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation.
Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles.
We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z) - Point Cloud Recognition with Position-to-Structure Attention
Transformers [24.74805434602145]
Position-to-Structure Attention Transformers (PS-Former) is a Transformer-based algorithm for 3D point cloud recognition.
PS-Former deals with the challenge in 3D point cloud representation where points are not positioned in a fixed grid structure.
PS-Former demonstrates competitive experimental results on three 3D point cloud tasks including classification, part segmentation, and scene segmentation.
arXiv Detail & Related papers (2022-10-05T05:40:33Z) - PointConvFormer: Revenge of the Point-based Convolution [7.539787913497268]
We introduce PointConvFormer, a novel building block for point cloud based deep network architectures.
Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention.
Our results show that PointConvFormer offers a better accuracy-speed tradeoff than classic convolutions, regular transformers, and voxelized sparse convolution approaches.
arXiv Detail & Related papers (2022-08-04T20:31:46Z) - Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot
Segmentation [58.4650849317274]
Volumetric Aggregation with Transformers (VAT) is a cost aggregation network for few-shot segmentation.
VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.
arXiv Detail & Related papers (2022-07-22T04:10:30Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions.
When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z) - LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects.
We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network.
This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.