3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
- URL: http://arxiv.org/abs/2203.00828v1
- Date: Wed, 2 Mar 2022 02:42:14 GMT
- Title: 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
- Authors: Dening Lu, Qian Xie, Linlin Xu, Jonathan Li
- Abstract summary: This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
- Score: 23.0009969537045
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although accurate and fast point cloud classification is a fundamental task
in 3D applications, it is difficult to achieve this purpose due to the
irregularity and disorder of point clouds that make it challenging to achieve
effective and efficient global discriminative feature learning. Lately, 3D
Transformers have been adopted to improve point cloud processing. Nevertheless,
massive Transformer layers tend to incur huge computational and memory costs.
This paper presents a novel hierarchical framework that incorporates
convolution with Transformer for point cloud classification, named 3D
Convolution-Transformer Network (3DCTN), to combine the strong and efficient
local feature learning ability of convolution with the remarkable global
context modeling capability of Transformer. Our method has two main modules
operating on the downsampling point sets, and each module consists of a
multi-scale local feature aggregating (LFA) block and a global feature learning
(GFL) block, which are implemented by using Graph Convolution and Transformer
respectively. We also conduct a detailed investigation on a series of
Transformer variants to explore better performance for our network. Various
experiments on ModelNet40 demonstrate that our method achieves state-of-the-art
classification performance, in terms of both accuracy and efficiency.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing [19.73918716354272]
We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
arXiv Detail & Related papers (2024-05-23T20:50:50Z) - AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware
Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.
Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z) - Hierarchical Point Attention for Indoor 3D Object Detection [111.04397308495618]
This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors.
First, we propose Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning.
Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals.
arXiv Detail & Related papers (2023-01-06T18:52:12Z) - 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN)
The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Transformers in 3D Point Clouds: A Survey [27.784721081318935]
3D Transformer models have been proven to have the remarkable ability of long-range dependencies modeling.
This survey aims to provide a comprehensive overview of 3D Transformers designed for various tasks.
arXiv Detail & Related papers (2022-05-16T01:32:18Z) - Deep Point Cloud Reconstruction [74.694733918351]
Point cloud obtained from 3D scanning is often sparse, noisy, and irregular.
To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud.
We propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points.
arXiv Detail & Related papers (2021-11-23T07:53:28Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - 3D Object Detection with Pointformer [29.935891419574602]
We propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.
A Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level.
A Global Transformer is designed to learn context-aware representations at the scene level.
arXiv Detail & Related papers (2020-12-21T15:12:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.