Related papers: Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers

Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers

URL: http://arxiv.org/abs/2502.14142v2
Date: Fri, 21 Feb 2025 22:56:49 GMT
Title: Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers
Authors: Takahiko Furuya,
Abstract summary: This paper proposes a novel PEFT algorithm called Side Token Adaptation on a neighborhood Graph (STAG) to achieve superior temporal and spatial efficiency.<n>STAG employs a graph convolutional side network operating in parallel with a frozen backbone Transformer to adapt tokens to downstream tasks.<n>We also present Point Cloud Classification 13 (PCC13), a new benchmark comprising diverse publicly available 3D point cloud datasets.
Score: 1.19658449368018
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Parameter-efficient fine-tuning (PEFT) of pre-trained 3D point cloud Transformers has emerged as a promising technique for 3D point cloud analysis. While existing PEFT methods attempt to minimize the number of tunable parameters, they often suffer from high temporal and spatial computational costs during fine-tuning. This paper proposes a novel PEFT algorithm called Side Token Adaptation on a neighborhood Graph (STAG) to achieve superior temporal and spatial efficiency. STAG employs a graph convolutional side network operating in parallel with a frozen backbone Transformer to adapt tokens to downstream tasks. Through efficient graph convolution, parameter sharing, and reduced gradient computation, STAG significantly reduces both temporal and spatial costs for fine-tuning. We also present Point Cloud Classification 13 (PCC13), a new benchmark comprising diverse publicly available 3D point cloud datasets to facilitate comprehensive evaluation. Extensive experiments using multiple pre-trained models and PCC13 demonstrates the effectiveness of STAG. Specifically, STAG maintains classification accuracy comparable to existing methods while reducing tunable parameters to only 0.43M and achieving significant reductions in both computation time and memory consumption for fine-tuning. Code and benchmark will be available at: https://github.com/takahikof/STAG.

Related papers

3D Test-time Adaptation via Graph Spectral Driven Point Shift [19.664235213514743]
Graph Spectral Domain Test-Time Adaptation (GSDTTA) is a novel approach for 3D point cloud classification.<n>It shifts adaptation to the graph spectral domain, enabling more efficient adaptation by capturing global structural properties with fewer parameters.<n> Experimental results and ablation studies on benchmark datasets demonstrate the effectiveness of GSDTTA.
arXiv Detail & Related papers (2025-07-24T09:18:39Z)
On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation [52.96632954620623]
We introduce a novel geometry-aware PEFT module specifically designed for 3D point cloud transformers.<n>Our approach sets a new benchmark for efficient, scalable, and geometry-aware fine-tuning of large-scale 3D point cloud models.
arXiv Detail & Related papers (2025-05-28T15:08:36Z)
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning [49.91297276176978]
We propose a novel. Efficient Fine-Tuning (PEFT) method for point cloud, called Point GST. Point GST freezes the pre-trained model and introduces a trainable Point Cloud Spectral Adapter (PCSA) to finetune parameters in the spectral domain. Extensive experiments on challenging point cloud datasets demonstrate that Point GST not only outperforms its fully finetuning counterpart but also significantly reduces trainable parameters.
arXiv Detail & Related papers (2024-10-10T17:00:04Z)
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference [14.030836300221756]
textbfSparse-Tuning is a novel PEFT method that accounts for the information redundancy in images and videos. Sparse-Tuning minimizes the quantity of tokens processed at each layer, leading to a quadratic reduction in computational and memory overhead. Our results show that our Sparse-Tuning reduces GFLOPs to textbf62%-70% of the original ViT-B while achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-05-23T15:34:53Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
Adaptive Point Transformer [88.28498667506165]
Adaptive Point Cloud Transformer (AdaPT) is a standard PT model augmented by an adaptive token selection mechanism. AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds.
arXiv Detail & Related papers (2024-01-26T13:24:45Z)
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models [46.42092771753465]
We introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks.
arXiv Detail & Related papers (2023-10-04T16:49:36Z)
AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem. We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion. Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z)
EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder [60.52613206271329]
This paper introduces textbfEfficient textbfPoint textbfCloud textbfLearning (EPCL) for training high-quality point cloud models with a frozen CLIP transformer. Our EPCL connects the 2D and 3D modalities by semantically aligning the image features and point cloud features without paired 2D-3D data.
arXiv Detail & Related papers (2022-12-08T06:27:11Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
Point-Voxel Transformer: An Efficient Approach To 3D Deep Learning [5.236787242129767]
We present a novel 3D Transformer, called Point-Voxel Transformer (PVT) that leverages self-attention computation in points to gather global context features. Our method fully exploits the potentials of Transformer architecture, paving the road to efficient and accurate recognition results.
arXiv Detail & Related papers (2021-08-13T06:07:57Z)
DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection. We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy. We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z)
SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation [71.2856098776959]
Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation.
arXiv Detail & Related papers (2021-05-10T15:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.