Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation
- URL: http://arxiv.org/abs/2304.14124v1
- Date: Thu, 27 Apr 2023 12:17:35 GMT
- Title: Exploiting Inductive Bias in Transformer for Point Cloud Classification
and Segmentation
- Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
- Abstract summary: In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations.
Local feature learning is performed through Relative Position, Attentive Feature Pooling.
We demonstrate its superiority experimentally on classification and segmentation tasks.
- Score: 22.587913528540465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discovering inter-point connection for efficient high-dimensional feature
extraction from point coordinate is a key challenge in processing point cloud.
Most existing methods focus on designing efficient local feature extractors
while ignoring global connection, or vice versa. In this paper, we design a new
Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point
relations, which considers both local and global attentions. Specifically,
considering local spatial coherence, local feature learning is performed
through Relative Position Encoding and Attentive Feature Pooling. We
incorporate the learned locality into the Transformer module. The local feature
affects value component in Transformer to modulate the relationship between
channels of each point, which can enhance self-attention mechanism with
locality based channel interaction. We demonstrate its superiority
experimentally on classification and segmentation tasks. The code is available
at: https://github.com/jiamang/IBT
Related papers
- Point Cloud Classification Using Content-based Transformer via
Clustering in Feature Space [25.57569871876213]
We propose a point content-based Transformer architecture, called PointConT for short.
It exploits the locality of points in the feature space (content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class.
We also introduce an Inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately.
arXiv Detail & Related papers (2023-03-08T14:11:05Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN)
The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Points to Patches: Enabling the Use of Self-Attention for 3D Shape
Recognition [19.89482062012177]
We propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms.
Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer.
We also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines.
arXiv Detail & Related papers (2022-04-08T09:31:24Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Conformer: Local Features Coupling Global Representations for Visual
Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning.
Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z) - LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects.
We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network.
This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z) - Point Transformer [15.312334863052968]
Point Transformer is a deep neural network that operates on unordered and unstructured point sets.
We introduce the local-global attention mechanism, which aims to capture spatial point relations and shape information.
The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into computer vision applications.
arXiv Detail & Related papers (2020-11-02T12:26:14Z) - Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT)
FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts.
We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.