Related papers: Point Transformer

Point Transformer

URL: http://arxiv.org/abs/2011.00931v2
Date: Thu, 14 Oct 2021 10:51:39 GMT
Title: Point Transformer
Authors: Nico Engel, Vasileios Belagiannis and Klaus Dietmayer
Abstract summary: Point Transformer is a deep neural network that operates on unordered and unstructured point sets. We introduce the local-global attention mechanism, which aims to capture spatial point relations and shape information. The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into computer vision applications.
Score: 15.312334863052968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. We design Point Transformer to extract local and global features and relate both representations by introducing the local-global attention mechanism, which aims to capture spatial point relations and shape information. For that purpose, we propose SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score. The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into common computer vision applications. We evaluate our approach on standard classification and part segmentation benchmarks to demonstrate competitive results compared to the prior work. Code is publicly available at: https://github.com/engelnico/point-transformer

Related papers

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer [75.2251801053839]
We present a novel Point-Voxel Transformer for single-stage 3D detection (PVT-SSD) We propose a Point-Voxel Transformer (PVT) module that obtains long-range contexts in a cheap manner from voxels. The experiments on several autonomous driving benchmarks verify the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2023-05-11T07:37:15Z)
Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation [22.587913528540465]
In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations. Local feature learning is performed through Relative Position, Attentive Feature Pooling. We demonstrate its superiority experimentally on classification and segmentation tasks.
arXiv Detail & Related papers (2023-04-27T12:17:35Z)
Self-positioning Point-based Transformer for Point Cloud Understanding [18.394318824968263]
Self-Positioning point-based Transformer (SPoTr) is designed to capture both local and global shape contexts with reduced complexity. SPoTr achieves an accuracy gain of 2.6% over the previous best models on shape classification with ScanObjectNN.
arXiv Detail & Related papers (2023-03-29T04:27:11Z)
Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z)
Point Cloud Recognition with Position-to-Structure Attention Transformers [24.74805434602145]
Position-to-Structure Attention Transformers (PS-Former) is a Transformer-based algorithm for 3D point cloud recognition. PS-Former deals with the challenge in 3D point cloud representation where points are not positioned in a fixed grid structure. PS-Former demonstrates competitive experimental results on three 3D point cloud tasks including classification, part segmentation, and scene segmentation.
arXiv Detail & Related papers (2022-10-05T05:40:33Z)
PointConvFormer: Revenge of the Point-based Convolution [7.539787913497268]
We introduce PointConvFormer, a novel building block for point cloud based deep network architectures. Inspired by generalization theory, PointConvFormer combines ideas from point convolution, where filter weights are only based on relative position, and Transformers which utilize feature-based attention. Our results show that PointConvFormer offers a better accuracy-speed tradeoff than classic convolutions, regular transformers, and voxelized sparse convolution approaches.
arXiv Detail & Related papers (2022-08-04T20:31:46Z)
Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation [58.4650849317274]
Volumetric Aggregation with Transformers (VAT) is a cost aggregation network for few-shot segmentation. VAT attains state-of-the-art performance for semantic correspondence as well, where cost aggregation also plays a central role.
arXiv Detail & Related papers (2022-07-22T04:10:30Z)
Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance. To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information. Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z)
Vision Transformer with Progressive Sampling [73.60630716500154]
We propose an iterative and progressive sampling strategy to locate discriminative regions. When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy.
arXiv Detail & Related papers (2021-08-03T18:04:31Z)
LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.