Related papers: Point Cloud Learning with Transformer

Point Cloud Learning with Transformer

URL: http://arxiv.org/abs/2104.13636v1
Date: Wed, 28 Apr 2021 08:39:21 GMT
Title: Point Cloud Learning with Transformer
Authors: Xian-Feng Han, Yu-Jia Kuang, Guo-Qiang Xiao
Abstract summary: We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales. A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
Score: 2.3204178451683264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of our methods on 3D shape classification, part segmentation and semantic segmentation tasks.

Related papers

PointCAT: Cross-Attention Transformer for point cloud [1.3176016397292067]
We present Point Cross-Attention Transformer (PointCAT), a novel end-to-end network architecture. Our approach combines multi-scale features via two seprate cross-attention transformer branches. Our method outperforms or achieves comparable performance to several approaches in shape classification, part segmentation and semantic segmentation tasks.
arXiv Detail & Related papers (2023-04-06T11:58:18Z)
Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z)
Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module. Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z)
PSFormer: Point Transformer for 3D Salient Object Detection [8.621996554264275]
PSFormer is an encoder-decoder network that takes full advantage of transformers to model contextual information. In the encoder, we develop a Point Context Transformer (PCT) module to capture region contextual features at the point level. In the decoder, we develop a Scene Context Transformer (SCT) module to learn context representations at the scene level.
arXiv Detail & Related papers (2022-10-28T06:34:28Z)
Learning Explicit Object-Centric Representations with Vision Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers. We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z)
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning. The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily. Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
A Survey of Visual Transformers [30.082304742571598]
Transformer, an attention-based encoder-decoder architecture, has revolutionized the field of natural language processing. Some pioneering works have recently been done on adapting Transformer architectures to Computer Vision (CV) fields. We have provided a comprehensive review of over one hundred different visual Transformers for three fundamental CV tasks.
arXiv Detail & Related papers (2021-11-11T07:56:04Z)
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
Dual Transformer for Point Cloud Analysis [2.160196691362033]
We present a novel point cloud representation learning architecture, named Dual Transformer Network (DTNet) Specifically, by aggregating the well-designed point-wise and channel-wise multi-head self-attention models simultaneously, DPCT module can capture much richer contextual dependencies semantically from perspective of position and channel. Extensive quantitative and qualitative experiments on publicly available benchmarks demonstrate the effectiveness of our proposed transformer framework for the tasks of 3D point cloud classification and segmentation, achieving highly competitive performance in comparison with the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-27T08:41:02Z)
Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence. Transformers require minimal inductive biases for their design and are naturally suited as set-functions. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.