Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis
- URL: http://arxiv.org/abs/2509.00374v1
- Date: Sat, 30 Aug 2025 06:02:21 GMT
- Title: Adaptive Point-Prompt Tuning: Fine-Tuning Heterogeneous Foundation Models for 3D Point Cloud Analysis
- Authors: Mengke Li, Lihao Chen, Peng Zhang, Yiu-ming Cheung, Hui Huang,
- Abstract summary: We propose the Adaptive Point-Prompt Tuning (APPT) method, which fine-tunes pre-trained models with a modest number of parameters.<n>We convert raw point clouds into point embeddings by aggregating local geometry to capture spatial features followed by linear layers.<n>To calibrate self-attention across source domains of any modality to 3D, we introduce a prompt generator that shares weights with the point embedding module.
- Score: 51.37795317716487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning strategies for foundation models in 1D textual and 2D visual analysis have demonstrated remarkable efficacy. However, due to the scarcity of point cloud data, pre-training large 3D models remains a challenging task. While many efforts have been made to apply pre-trained visual models to 3D domains through "high-to-low" mapping, these approaches often lead to the loss of spatial geometries and lack a generalizable framework for adapting any modality to 3D. This paper, therefore, attempts to directly leverage point features to calibrate the heterogeneous foundation model of any modality for 3D point cloud analysis. Specifically, we propose the Adaptive Point-Prompt Tuning (APPT) method, which fine-tunes pre-trained models with a modest number of parameters, enabling direct point cloud processing without heterogeneous mappings. We convert raw point clouds into point embeddings by aggregating local geometry to capture spatial features followed by linear layers to ensure seamless utilization of frozen pre-trained models. Given the inherent disorder of point clouds, in contrast to the structured nature of images and language, we employ a permutation-invariant feature to capture the relative positions of point embeddings, thereby obtaining point tokens enriched with location information to optimize self-attention mechanisms. To calibrate self-attention across source domains of any modality to 3D and reduce computational overhead, we introduce a prompt generator that shares weights with the point embedding module, dynamically producing point-prompts without adding additional parameters. These prompts are then concatenated into a frozen foundation model, providing rich global structural information and compensating for the lack of structural context in the heterogeneous data.
Related papers
- UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting [64.31900521467362]
No existing pre-training method is equally effective for both object- and scene-level point clouds.<n>We introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture.
arXiv Detail & Related papers (2025-06-11T17:23:21Z) - DG-MVP: 3D Domain Generalization via Multiple Views of Point Clouds for Classification [10.744510913722817]
Deep neural networks have achieved significant success in 3D point cloud classification.<n>In this paper, we focus on the 3D point cloud domain generalization problem.<n>We propose a novel method for 3D point cloud domain generalization, which can generalize to unseen domains of point clouds.
arXiv Detail & Related papers (2025-04-16T19:43:32Z) - 3D Point Cloud Generation via Autoregressive Up-sampling [60.05226063558296]
We introduce a pioneering autoregressive generative model for 3D point cloud generation.<n>Inspired by visual autoregressive modeling, we conceptualize point cloud generation as an autoregressive up-sampling process.<n>PointARU progressively refines 3D point clouds from coarse to fine scales.
arXiv Detail & Related papers (2025-03-11T16:30:45Z) - Position-aware Guided Point Cloud Completion with CLIP Model [25.084811702682778]
We propose a rapid and efficient method to expand an unimodal framework into a multimodal framework.<n>This approach incorporates a position-aware module designed to enhance the spatial information of the missing parts.<n>In addition, we establish a Point-Text-Image triplet corpus PCI-TI and MVP-TI based on the existing unimodal point cloud completion dataset.
arXiv Detail & Related papers (2024-12-11T10:43:11Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation [33.07886526437753]
3D point clouds captured from real-world sensors frequently encompass noisy points due to various obstacles.
These challenges hinder the deployment of pre-trained point cloud recognition models trained on clean point clouds.
We propose CloudFixer, a test-time input adaptation method tailored for 3D point clouds.
arXiv Detail & Related papers (2024-07-23T05:35:04Z) - ParaPoint: Learning Global Free-Boundary Surface Parameterization of 3D Point Clouds [52.03819676074455]
ParaPoint is an unsupervised neural learning pipeline for achieving global free-boundary surface parameterization.
This work makes the first attempt to investigate neural point cloud parameterization that pursues both global mappings and free boundaries.
arXiv Detail & Related papers (2024-03-15T14:35:05Z) - Flow-based GAN for 3D Point Cloud Generation from a Single Image [16.04710129379503]
We introduce a hybrid explicit-implicit generative modeling scheme, which inherits the flow-based explicit generative models for sampling point clouds with arbitrary resolutions.
We evaluate on the large-scale synthetic dataset ShapeNet, with the experimental results demonstrating the superior performance of the proposed method.
arXiv Detail & Related papers (2022-10-08T17:58:20Z) - Dual Adaptive Transformations for Weakly Supervised Point Cloud
Segmentation [78.6612285236938]
We propose a novel DAT (textbfDual textbfAdaptive textbfTransformations) model for weakly supervised point cloud segmentation.
We evaluate our proposed DAT model with two popular backbones on the large-scale S3DIS and ScanNet-V2 datasets.
arXiv Detail & Related papers (2022-07-19T05:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.