PointVST: Self-Supervised Pre-training for 3D Point Clouds via
View-Specific Point-to-Image Translation
- URL: http://arxiv.org/abs/2212.14197v4
- Date: Tue, 19 Dec 2023 05:54:13 GMT
- Title: PointVST: Self-Supervised Pre-training for 3D Point Clouds via
View-Specific Point-to-Image Translation
- Authors: Qijian Zhang, Junhui Hou
- Abstract summary: This paper proposes a translative pre-training framework, namely PointVST.
It is driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images.
- Score: 64.858505571083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The past few years have witnessed the great success and prevalence of
self-supervised representation learning within the language and 2D vision
communities. However, such advancements have not been fully migrated to the
field of 3D point cloud learning. Different from existing pre-training
paradigms designed for deep point cloud feature extractors that fall into the
scope of generative modeling or contrastive learning, this paper proposes a
translative pre-training framework, namely PointVST, driven by a novel
self-supervised pretext task of cross-modal translation from 3D point clouds to
their corresponding diverse forms of 2D rendered images. More specifically, we
begin with deducing view-conditioned point-wise embeddings through the
insertion of the viewpoint indicator, and then adaptively aggregate a
view-specific global codeword, which can be further fed into subsequent 2D
convolutional translation heads for image generation. Extensive experimental
evaluations on various downstream task scenarios demonstrate that our PointVST
shows consistent and prominent performance superiority over current
state-of-the-art approaches as well as satisfactory domain transfer capability.
Our code will be publicly available at https://github.com/keeganhk/PointVST.
Related papers
- HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - Explore In-Context Learning for 3D Point Cloud Understanding [71.20912026561484]
We introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds.
We propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator.
We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks.
arXiv Detail & Related papers (2023-06-14T17:53:21Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding [2.8661021832561757]
CrossPoint is a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.
Our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation.
arXiv Detail & Related papers (2022-03-01T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.