P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting
- URL: http://arxiv.org/abs/2208.02812v1
- Date: Thu, 4 Aug 2022 17:59:03 GMT
- Title: P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting
- Authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu
- Abstract summary: We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
- Score: 94.11915008006483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, pre-training big models on large-scale datasets has become a
crucial topic in deep learning. The pre-trained models with high representation
ability and transferability achieve a great success and dominate many
downstream tasks in natural language processing and 2D vision. However, it is
non-trivial to promote such a pretraining-tuning paradigm to the 3D vision,
given the limited training data that are relatively inconvenient to collect. In
this paper, we provide a new perspective of leveraging pre-trained 2D knowledge
in 3D domain to tackle this problem, tuning pre-trained image models with the
novel Point-to-Pixel prompting for point cloud analysis at a minor parameter
cost. Following the principle of prompting engineering, we transform point
clouds into colorful images with geometry-preserved projection and
geometry-aware coloring to adapt to pre-trained image models, whose weights are
kept frozen during the end-to-end optimization of point cloud analysis tasks.
We conduct extensive experiments to demonstrate that cooperating with our
proposed Point-to-Pixel Prompting, better pre-trained image model will lead to
consistently better performance in 3D vision. Enjoying prosperous development
from image pre-training field, our method attains 89.3% accuracy on the hardest
setting of ScanObjectNN, surpassing conventional point cloud models with much
fewer trainable parameters. Our framework also exhibits very competitive
performance on ModelNet classification and ShapeNet Part Segmentation. Code is
available at https://github.com/wangzy22/P2P
Related papers
- Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers [38.08724410736292]
This paper attempts to leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis.
We propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds.
arXiv Detail & Related papers (2024-07-18T06:32:45Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - PointVST: Self-Supervised Pre-training for 3D Point Clouds via
View-Specific Point-to-Image Translation [64.858505571083]
This paper proposes a translative pre-training framework, namely PointVST.
It is driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images.
arXiv Detail & Related papers (2022-12-29T07:03:29Z) - 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model.
Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images.
In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z) - Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud
Analysis [33.31864436614945]
We propose a novel pre-training method for 3D point cloud models.
Our pre-training is self-supervised by a local pixel/point level correspondence loss and a global image/point cloud level loss.
These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks.
arXiv Detail & Related papers (2022-10-28T05:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.