Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
- URL: http://arxiv.org/abs/2307.14971v2
- Date: Thu, 7 Sep 2023 05:44:37 GMT
- Title: Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
- Authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu
- Abstract summary: generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
- Score: 97.58685709663287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the overwhelming trend of mask image modeling led by MAE, generative
pre-training has shown a remarkable potential to boost the performance of
fundamental models in 2D vision. However, in 3D vision, the over-reliance on
Transformer-based backbones and the unordered nature of point clouds have
restricted the further development of generative pre-training. In this paper,
we propose a novel 3D-to-2D generative pre-training method that is adaptable to
any point cloud model. We propose to generate view images from different
instructed poses via the cross-attention mechanism as the pre-training scheme.
Generating view images has more precise supervision than its point cloud
counterpart, thus assisting 3D backbones to have a finer comprehension of the
geometrical structure and stereoscopic relations of the point cloud.
Experimental results have proved the superiority of our proposed 3D-to-2D
generative pre-training over previous pre-training methods. Our method is also
effective in boosting the performance of architecture-oriented approaches,
achieving state-of-the-art performance when fine-tuning on ScanObjectNN
classification and ShapeNetPart segmentation tasks. Code is available at
https://github.com/wangzy22/TAP.
Related papers
- Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers [38.08724410736292]
This paper attempts to leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis.
We propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds.
arXiv Detail & Related papers (2024-07-18T06:32:45Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - PointVST: Self-Supervised Pre-training for 3D Point Clouds via
View-Specific Point-to-Image Translation [64.858505571083]
This paper proposes a translative pre-training framework, namely PointVST.
It is driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images.
arXiv Detail & Related papers (2022-12-29T07:03:29Z) - 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images [128.40422211090078]
We propose a knowledge distillation method for 3D point cloud pre-trained models to acquire knowledge directly from the 2D representation learning model.
Specifically, we introduce a cross-attention mechanism to extract concept features from 3D point cloud and compare them with the semantic information from 2D images.
In this scheme, the point cloud pre-trained models learn directly from rich information contained in 2D teacher models.
arXiv Detail & Related papers (2022-12-17T23:21:04Z) - P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
Point-to-Pixel Prompting [94.11915008006483]
We propose a novel Point-to-Pixel prompting for point cloud analysis.
Our method attains 89.3% accuracy on the hardest setting of ScanObjectNN.
Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Code.
arXiv Detail & Related papers (2022-08-04T17:59:03Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.