Related papers: Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation

Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation

URL: http://arxiv.org/abs/2304.14124v1
Date: Thu, 27 Apr 2023 12:17:35 GMT
Title: Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation
Authors: Zihao Li, Pan Gao, Hui Yuan, Ran Wei, Manoranjan Paul
Abstract summary: In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations. Local feature learning is performed through Relative Position, Attentive Feature Pooling. We demonstrate its superiority experimentally on classification and segmentation tasks.
Score: 22.587913528540465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discovering inter-point connection for efficient high-dimensional feature extraction from point coordinate is a key challenge in processing point cloud. Most existing methods focus on designing efficient local feature extractors while ignoring global connection, or vice versa. In this paper, we design a new Inductive Bias-aided Transformer (IBT) method to learn 3D inter-point relations, which considers both local and global attentions. Specifically, considering local spatial coherence, local feature learning is performed through Relative Position Encoding and Attentive Feature Pooling. We incorporate the learned locality into the Transformer module. The local feature affects value component in Transformer to modulate the relationship between channels of each point, which can enhance self-attention mechanism with locality based channel interaction. We demonstrate its superiority experimentally on classification and segmentation tasks. The code is available at: https://github.com/jiamang/IBT

Related papers

LoFLAT: Local Feature Matching using Focused Linear Attention Transformer [36.53651224633837]
We propose the LoFLAT, a novel Local Feature matching using Focused Linear Attention Transformer. Our LoFLAT consists of three main modules: the Feature Extraction Module, the Feature Transformer Module, and the Matching Module. The proposed LoFLAT outperforms the LoFTR method in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2024-10-30T05:38:07Z)
GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation [33.72549134362884]
We propose GSTran, a novel transformer network tailored for the segmentation task. The proposed network mainly consists of two principal components: a local geometric transformer and a global semantic transformer. Experiments on ShapeNetPart and S3DIS benchmarks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-08-21T12:12:37Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module. Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z)
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions. We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers. The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z)
3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN) The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z)
Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition [19.89482062012177]
We propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms. Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer. We also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines.
arXiv Detail & Related papers (2022-04-08T09:31:24Z)
LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z)
Conformer: Local Features Coupling Global Representations for Visual Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)
LocalViT: Bringing Locality to Vision Transformers [132.42018183859483]
locality is essential for images since it pertains to structures like lines, edges, shapes, and even objects. We add locality to vision transformers by introducing depth-wise convolution into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks.
arXiv Detail & Related papers (2021-04-12T17:59:22Z)
Feature Pyramid Transformer [121.50066435635118]
We propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT) FPT transforms any feature pyramid into another feature pyramid of the same size but with richer contexts. We conduct extensive experiments in both instance-level (i.e., object detection and instance segmentation) and pixel-level segmentation tasks.
arXiv Detail & Related papers (2020-07-18T15:16:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.