Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation
- URL: http://arxiv.org/abs/2304.14124v1
- Date: Thu, 27 Apr 2023 12:17:35 GMT
- Title: Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation
- Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
- Abstract summary: In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations.
Local feature learning is performed through Relative Position, Attentive Feature Pooling.
We demonstrate its superiority experimentally on classification and segmentation tasks.
- Score: 22.587913528540465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discovering inter-point connection for efficient high-dimensional feature
extraction from point coordinate is a key challenge in processing point cloud.
Most existing methods focus on designing efficient local feature extractors
while ignoring global connection, or vice versa. In this paper, we design a new
Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point
relations, which considers both local and global attentions. Specifically,
considering local spatial coherence, local feature learning is performed
through Relative Position Encoding and Attentive Feature Pooling. We
incorporate the learned locality into the Transformer module. The local feature
affects value component in Transformer to modulate the relationship between
channels of each point, which can enhance self-attention mechanism with
locality based channel interaction. We demonstrate its superiority
experimentally on classification and segmentation tasks. The code is available
at: https://github.com/jiamang/IBT
Related papers
- LoFLAT: Local Feature Matching using Focused Linear Attention Transformer [36.53651224633837]
We propose the LoFLAT, a novel Local Feature matching using Focused Linear Attention Transformer.
Our LoFLAT consists of three main modules: the Feature Extraction Module, the Feature Transformer Module, and the Matching Module.
The proposed LoFLAT outperforms the LoFTR method in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2024-10-30T05:38:07Z) - GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation [33.72549134362884]
We propose GSTran, a novel transformer network tailored for the segmentation task.
The proposed network mainly consists of two principal components: a local geometric transformer and a global semantic transformer.
Experiments on ShapeNetPart and S3DIS benchmarks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-08-21T12:12:37Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN)
The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z) - Points to Patches: Enabling the Use of Self-Attention for 3D Shape
Recognition [19.89482062012177]
We propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms.
Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer.
We also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines.
arXiv Detail & Related papers (2022-04-08T09:31:24Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects.
We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network.
This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z) - Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.