Related papers: Efficient Point Transformer with Dynamic Token Aggregating for LiDAR Point Cloud Processing

Efficient Point Transformer with Dynamic Token Aggregating for LiDAR Point Cloud Processing

URL: http://arxiv.org/abs/2405.15827v2
Date: Wed, 25 Dec 2024 06:20:14 GMT
Title: Efficient Point Transformer with Dynamic Token Aggregating for LiDAR Point Cloud Processing
Authors: Dening Lu, Jun Zhou, Kyle, Gao, Linlin Xu, Jonathan Li,
Abstract summary: LiDAR point cloud processing and analysis have made great progress due to the development of 3D Transformers.<n>Existing 3D Transformer methods usually are computationally expensive and inefficient due to their huge and redundant attention maps.<n>We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
Score: 19.73918716354272
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recently, LiDAR point cloud processing and analysis have made great progress due to the development of 3D Transformers. However, existing 3D Transformer methods usually are computationally expensive and inefficient due to their huge and redundant attention maps. They also tend to be slow due to requiring time-consuming point cloud sampling and grouping processes. To address these issues, we propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing. Firstly, we propose an efficient Learnable Token Sparsification (LTS) block, which considers both local and global semantic information for the adaptive selection of key tokens. Secondly, to achieve the feature aggregation for sparsified tokens, we present the first Dynamic Token Aggregating (DTA) block in the 3D Transformer paradigm, providing our model with strong aggregated features while preventing information loss. After that, a dual-attention Transformer-based Global Feature Enhancement (GFE) block is used to improve the representation capability of the model. Equipped with LTS, DTA, and GFE blocks, DTA-Former achieves excellent classification results via hierarchical feature learning. Lastly, a novel Iterative Token Reconstruction (ITR) block is introduced for dense prediction whereby the semantic features of tokens and their semantic relationships are gradually optimized during iterative reconstruction. Based on ITR, we propose a new W-net architecture, which is more suitable for Transformer-based feature learning than the common U-net design.

Related papers

Pre-Trained CNN Architecture for Transformer-Based Image Caption Generation Model [0.0]
This project presents a guide to constructing and comprehending transformer models for image captioning.<n>We leverage the well-established Transformer architecture, recognized for its effectiveness in managing sequential data.<n>Our approach exemplifies the utilization of parallelization for efficient training and inference.
arXiv Detail & Related papers (2025-09-22T05:32:52Z)
H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers [124.11648300910444]
We present a hierarchical plug-and-play pruning-and-$-recovering framework, called Hierarchical Hourglass Tokenizer (H$_2$OT)<n>Our method is general-purpose: it can be easily incorporated into common VPT models on both seq2seq and seq2frame pipelines.
arXiv Detail & Related papers (2025-09-08T17:59:59Z)
FASTer: Focal Token Acquiring-and-Scaling Transformer for Long-term 3D Object Detection [9.291995455336929]
We propose a Focal Token Acquring-and-Scaling Transformer (FASTer) FASTer condenses token sequences in an adaptive and lightweight manner. It significantly outperforms other state-of-the-art detectors in both performance and efficiency.
arXiv Detail & Related papers (2025-02-28T03:15:33Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
3D Learnable Supertoken Transformer for LiDAR Point Cloud Scene Segmentation [19.94836580257577]
This paper proposes a novel 3D Transformer framework, named 3D Learnable Supertoken Transformer (3DLST) The 3DLST is equipped with a novel W-net architecture instead of the common U-net design. It achieves satisfactory results in terms of algorithm efficiency, which is up to 5x faster than previous best-performing methods.
arXiv Detail & Related papers (2024-05-23T20:41:15Z)
AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem. We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion. Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z)
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks [88.77951448313486]
We present a new approach for model acceleration by exploiting spatial sparsity in visual data. We propose a dynamic token sparsification framework to prune redundant tokens. We extend our method to hierarchical models including CNNs and hierarchical vision Transformers.
arXiv Detail & Related papers (2022-07-04T17:00:51Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification [23.0009969537045]
This paper presents a novel hierarchical framework that incorporates convolution with Transformer for point cloud classification. Our method achieves state-of-the-art classification performance, in terms of both accuracy and efficiency.
arXiv Detail & Related papers (2022-03-02T02:42:14Z)
CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data. CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers. Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.