Positional Prompt Tuning for Efficient 3D Representation Learning
- URL: http://arxiv.org/abs/2408.11567v2
- Date: Wed, 24 Sep 2025 00:39:45 GMT
- Title: Positional Prompt Tuning for Efficient 3D Representation Learning
- Authors: Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei,
- Abstract summary: We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds.<n>We explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters.<n>PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen.
- Score: 16.868922142742168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We rethink the role of positional encoding in 3D representation learning and fine-tuning. We argue that using positional encoding in point Transformer-based methods serves to aggregate multi-scale features of point clouds. Additionally, we explore parameter-efficient fine-tuning (PEFT) through the lens of prompts and adapters, introducing a straightforward yet effective method called PPT for point cloud analysis. PPT incorporates increased patch tokens and trainable positional encoding while keeping most pre-trained model parameters frozen. Extensive experiments validate that PPT is both effective and efficient. Our proposed method of PEFT tasks, namely PPT, with only 1.05M of parameters for training, gets state-of-the-art results in several mainstream datasets, such as 95.01% accuracy in the ScanObjectNN OBJ_BG dataset. Codes and weights will be released at https://github.com/zsc000722/PPT.
Related papers
- Depth Completion as Parameter-Efficient Test-Time Adaptation [66.72360181325877]
CAPA is a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion.<n>For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency.
arXiv Detail & Related papers (2026-02-16T13:53:23Z) - Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers [1.19658449368018]
This paper proposes a novel PEFT algorithm called Side Token Adaptation on a neighborhood Graph (STAG) to achieve superior temporal and spatial efficiency.<n>STAG employs a graph convolutional side network operating in parallel with a frozen backbone Transformer to adapt tokens to downstream tasks.<n>We also present Point Cloud Classification 13 (PCC13), a new benchmark comprising diverse publicly available 3D point cloud datasets.
arXiv Detail & Related papers (2025-02-19T22:58:56Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation [15.97351561456467]
In this paper, we propose a novel PEFT approach, input-Conditioned transFormer, termed iConFormer.
We introduce an input-Conditioned Network (iCoN) in the dynamic adapter that enables instance-level feature transformation.
To be specific, iCoN generates channel-wise convolutional kernels for each feature and transform it using adaptive convolution process to effectively capture task-specific and fine-grained details tailor to downstream tasks.
arXiv Detail & Related papers (2024-09-04T16:06:23Z) - Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling [28.218242268501]
We argue that integrating task-level information into the encoding stage significantly enhances performance.
To facilitate effective communication between features from the encoding and decoding layers, we introduce a skip-attention-based up-sampling block.
We achieve state-of-the-art semantic segmentation results of 73.4% mIoU on S3DIS Area 5 and 62.4% mIoU on SWAN dataset.
arXiv Detail & Related papers (2024-03-21T04:34:24Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Parameter-efficient Prompt Learning for 3D Point Cloud Understanding [10.23165979353247]
This paper presents a parameter-efficient prompt tuning method to adapt a large multi-modal model for 3D point cloud understanding.
A PromptLearner module is devised to replace hand-crafted prompts with learnable contexts.
A lightweight PointAdapter module is arranged near target tasks to enhance prompt tuning for 3D point cloud understanding.
arXiv Detail & Related papers (2024-02-24T14:20:50Z) - Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models [46.42092771753465]
We introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.
Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks.
arXiv Detail & Related papers (2023-10-04T16:49:36Z) - Improving Position Encoding of Transformers for Multivariate Time Series
Classification [5.467400475482668]
We propose a new absolute position encoding method dedicated to time series data called time Absolute Position.
We then propose a novel time series classification (MTSC) model combining tAPE/eRPE and convolution-based input encoding named ConvTran to improve the position and data embedding of time series data.
arXiv Detail & Related papers (2023-05-26T05:30:04Z) - Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models [64.49254199311137]
We propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models.
The essence of IDPT is to develop a dynamic prompt generation module to perceive semantic prior features of each point cloud instance.
In experiments, IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters.
arXiv Detail & Related papers (2023-04-14T16:03:09Z) - Position-Guided Point Cloud Panoptic Segmentation Transformer [118.17651196656178]
This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
We observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain.
The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% on Semantic KITTI and nuScenes benchmark, respectively.
arXiv Detail & Related papers (2023-03-23T17:59:02Z) - Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning [91.5113227694443]
We propose a novel visual.
sensuous-aware fine-Tuning (SPT) scheme.
SPT allocates trainable parameters to task-specific important positions.
Experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods.
arXiv Detail & Related papers (2023-03-15T12:34:24Z) - DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception.
Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z) - AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware
Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.
Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - Robust Partial-to-Partial Point Cloud Registration in a Full Range [12.86951061306046]
We propose Graph Matching Consensus Network (GMCNet), which estimates pose-invariant correspondences for fullrange 1 Partial-to-Partial point cloud Registration (PPR)
GMCNet encodes point descriptors for each point cloud individually without using crosscontextual information, or ground truth correspondences for training.
arXiv Detail & Related papers (2021-11-30T17:56:24Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.