Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing
- URL: http://arxiv.org/abs/2405.15827v1
- Date: Thu, 23 May 2024 20:50:50 GMT
- Title: Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing
- Authors: Dening Lu, Jun Zhou, Kyle, Gao, Linlin Xu, Jonathan Li,
- Abstract summary: We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
- Score: 19.73918716354272
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, point cloud processing and analysis have made great progress due to the development of 3D Transformers. However, existing 3D Transformer methods usually are computationally expensive and inefficient due to their huge and redundant attention maps. They also tend to be slow due to requiring time-consuming point cloud sampling and grouping processes. To address these issues, we propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing. Firstly, we propose an efficient Learnable Token Sparsification (LTS) block, which considers both local and global semantic information for the adaptive selection of key tokens. Secondly, to achieve the feature aggregation for sparsified tokens, we present the first Dynamic Token Aggregating (DTA) block in the 3D Transformer paradigm, providing our model with strong aggregated features while preventing information loss. After that, a dual-attention Transformer-based Global Feature Enhancement (GFE) block is used to improve the representation capability of the model. Equipped with LTS, DTA, and GFE blocks, DTA-Former achieves excellent classification results via hierarchical feature learning. Lastly, a novel Iterative Token Reconstruction (ITR) block is introduced for dense prediction whereby the semantic features of tokens and their semantic relationships are gradually optimized during iterative reconstruction. Based on ITR, we propose a new W-net architecture, which is more suitable for Transformer-based feature learning than the common U-net design. Extensive experiments demonstrate the superiority of our method. It achieves SOTA performance with up to 30$\times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - 3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation [19.94836580257577]
This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST)
The 3DLST is equipped with a novel W-net architecture instead of the common U-net design.
It achieves satisfactory results in terms of algorithm efficiency, which is up to 5x faster than previous best-performing methods.
arXiv Detail & Related papers (2024-05-23T20:41:15Z) - AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware
Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.
Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Dynamic Spatial Sparsification for Efficient Vision Transformers and
Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data.
We propose a dynamic token sparsification framework to prune redundant tokens.
We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - 3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification.
Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.