Dual Transformer for Point Cloud Analysis
- URL: http://arxiv.org/abs/2104.13044v1
- Date: Tue, 27 Apr 2021 08:41:02 GMT
- Title: Dual Transformer for Point Cloud Analysis
- Authors: Xian-Feng Han and Yi-Fei Jin and Hui-Xian Cheng and Guo-Qiang Xiao
- Abstract summary: We present a novel point cloud representation learning architecture, named Dual Transformer Network (DTNet)
Specifically, by aggregating the well-designed point-wise and channel-wise multi-head self-attention models simultaneously, DPCT module can capture much richer contextual dependencies semantically from perspective of position and channel.
Extensive quantitative and qualitative experiments on publicly available benchmarks demonstrate the effectiveness of our proposed transformer framework for the tasks of 3D point cloud classification and segmentation, achieving highly competitive performance in comparison with the state-of-the-art approaches.
- Score: 2.160196691362033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following the tremendous success of transformer in natural language
processing and image understanding tasks, in this paper, we present a novel
point cloud representation learning architecture, named Dual Transformer
Network (DTNet), which mainly consists of Dual Point Cloud Transformer (DPCT)
module. Specifically, by aggregating the well-designed point-wise and
channel-wise multi-head self-attention models simultaneously, DPCT module can
capture much richer contextual dependencies semantically from the perspective
of position and channel. With the DPCT module as a fundamental component, we
construct the DTNet for performing point cloud analysis in an end-to-end
manner. Extensive quantitative and qualitative experiments on publicly
available benchmarks demonstrate the effectiveness of our proposed transformer
framework for the tasks of 3D point cloud classification and segmentation,
achieving highly competitive performance in comparison with the
state-of-the-art approaches.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - PointCAT: Cross-Attention Transformer for point cloud [1.3176016397292067]
We present Point Cross-Attention Transformer (PointCAT), a novel end-to-end network architecture.
Our approach combines multi-scale features via two seprate cross-attention transformer branches.
Our method outperforms or achieves comparable performance to several approaches in shape classification, part segmentation and semantic segmentation tasks.
arXiv Detail & Related papers (2023-04-06T11:58:18Z) - 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z) - Deep Point Cloud Reconstruction [74.694733918351]
Point cloud obtained from 3D scanning is often sparse, noisy, and irregular.
To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud.
We propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points.
arXiv Detail & Related papers (2021-11-23T07:53:28Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Point Cloud Learning with Transformer [2.3204178451683264]
We introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT)
Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales.
A multi-level transformer module is designed to aggregate contextual information from different levels of each scale and enhance their interactions.
arXiv Detail & Related papers (2021-04-28T08:39:21Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - TransReID: Transformer-based Object Re-Identification [20.02035310635418]
Vision Transformer (ViT) is a pure transformer-based model for the object re-identification (ReID) task.
With several adaptations, a strong baseline ViT-BoT is constructed with ViT as backbone.
We propose a pure-transformer framework dubbed as TransReID, which is the first work to use a pure Transformer for ReID research.
arXiv Detail & Related papers (2021-02-08T17:33:59Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.