Adaptive Channel Encoding Transformer for Point Cloud Analysis
- URL: http://arxiv.org/abs/2112.02507v1
- Date: Sun, 5 Dec 2021 08:18:00 GMT
- Title: Adaptive Channel Encoding Transformer for Point Cloud Analysis
- Authors: Guoquan Xu, Hezhi Cao, Jianwei Wan, Ke Xu, Yanxin Ma, Cong Zhang
- Abstract summary: A channel convolution called Transformer-Conv is designed to encode the channel.
It can encode feature channels by capturing the potential relationship between coordinates and features.
Our method is superior to state-of-the-art point cloud classification and segmentation methods on three benchmark datasets.
- Score: 6.90125287791398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer plays an increasingly important role in various computer vision
areas and remarkable achievements have also been made in point cloud analysis.
Since they mainly focus on point-wise transformer, an adaptive channel encoding
transformer is proposed in this paper. Specifically, a channel convolution
called Transformer-Conv is designed to encode the channel. It can encode
feature channels by capturing the potential relationship between coordinates
and features. Compared with simply assigning attention weight to each channel,
our method aims to encode the channel adaptively. In addition, our network
adopts the neighborhood search method of low-level and high-level dual semantic
receptive fields to improve the performance. Extensive experiments show that
our method is superior to state-of-the-art point cloud classification and
segmentation methods on three benchmark datasets.
Related papers
- A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Improving Transformers using Faithful Positional Encoding [55.30212768657544]
We propose a new positional encoding method for a neural network architecture called the Transformer.
Unlike the standard sinusoidal positional encoding, our approach has a guarantee of not losing information about the positional order of the input sequence.
arXiv Detail & Related papers (2024-05-15T03:17:30Z) - Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging [17.07490339960335]
We introduce a novel hybrid CNN-Transformer segmentation architecture (PAG-TransYnet) designed for efficiently building a strong CNN-Transformer encoder.
Our approach exploits attention gates within a Dual Pyramid hybrid encoder.
arXiv Detail & Related papers (2024-04-28T14:37:10Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths.
We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately.
The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z) - Adaptive Channel Encoding for Point Cloud Analysis [7.696435157444049]
An adaptive channel encoding mechanism is proposed to capture channel relationships in this paper.
It improves the quality of the representation generated by the network by explicitly encoding the interdependence between the channels of its features.
arXiv Detail & Related papers (2021-12-05T08:20:27Z) - Transformer Assisted Convolutional Network for Cell Instance
Segmentation [5.195101477698897]
We present a transformer based approach to enhance the performance of the conventional convolutional feature extractor.
Our approach merges the convolutional feature maps with transformer-based token embeddings by applying a projection operation similar to self-attention in transformers.
arXiv Detail & Related papers (2021-10-05T18:18:31Z) - Visual Transformer Pruning [44.43429237788078]
We present an visual transformer pruning approach, which identifies the impacts of channels in each layer and then executes pruning accordingly.
The pipeline for visual transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning channels; 3) finetuning.
The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate its effectiveness.
arXiv Detail & Related papers (2021-04-17T09:49:24Z) - End-to-End Multi-Channel Transformer for Speech Recognition [9.949801888214527]
We leverage the neural transformer architectures for multi-channel speech recognition systems.
Our network consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA)
arXiv Detail & Related papers (2021-02-08T00:12:44Z) - Beyond Single Stage Encoder-Decoder Networks: Deep Decoders for Semantic
Image Segmentation [56.44853893149365]
Single encoder-decoder methodologies for semantic segmentation are reaching their peak in terms of segmentation quality and efficiency per number of layers.
We propose a new architecture based on a decoder which uses a set of shallow networks for capturing more information content.
In order to further improve the architecture we introduce a weight function which aims to re-balance classes to increase the attention of the networks to under-represented objects.
arXiv Detail & Related papers (2020-07-19T18:44:34Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.