Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost
3D Point Cloud Data-scarce Learning?
- URL: http://arxiv.org/abs/2304.10224v2
- Date: Fri, 4 Aug 2023 09:19:43 GMT
- Title: Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost
3D Point Cloud Data-scarce Learning?
- Authors: Haoyang Peng, Baopu Li, Bo Zhang, Xin Chen, Tao Chen, Hongyuan Zhu
- Abstract summary: This work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification.
MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification.
- Score: 38.06639044139636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud based 3D deep model has wide applications in many applications
such as autonomous driving, house robot, and so on. Inspired by the recent
prompt learning in natural language processing, this work proposes a novel
Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud
classification. MvNet investigates the possibility of leveraging the
off-the-shelf 2D pre-trained models to achieve the few-shot classification,
which can alleviate the over-dependence issue of the existing baseline models
towards the large-scale annotated 3D point cloud data. Specifically, MvNet
first encodes a 3D point cloud into multi-view image features for a number of
different views. Then, a novel multi-view prompt fusion module is developed to
effectively fuse information from different views to bridge the gap between 3D
point cloud data and 2D pre-trained models. A set of 2D image prompts can then
be derived to better describe the suitable prior knowledge for a large-scale
pre-trained image model for few-shot 3D point cloud classification. Extensive
experiments on ModelNet, ScanObjectNN, and ShapeNet datasets demonstrate that
MvNet achieves new state-of-the-art performance for 3D few-shot point cloud
image classification. The source code of this work will be available soon.
Related papers
- Point Cloud Self-supervised Learning via 3D to Multi-view Masked
Autoencoder [21.73287941143304]
Multi-Modality Masked AutoEncoders (MAE) methods leverage both 2D images and 3D point clouds for pre-training.
We introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds.
Our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks.
arXiv Detail & Related papers (2023-11-17T22:10:03Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained
Image-Language Models [56.324516906160234]
Generalizable 3D part segmentation is important but challenging in vision and robotics.
This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP.
We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm.
arXiv Detail & Related papers (2022-12-03T06:59:01Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets [45.78834662125001]
We show that we can indeed use the same neural net model architectures to understand both images and point-clouds.
Specifically, based on a 2D ConvNet pretrained on an image dataset, we can transfer the image model to a point-cloud model by textitinflating 2D convolutional filters to 3D.
The transferred model can achieve competitive performance on 3D point-cloud classification, indoor and driving scene segmentation, even beating a wide range of point-cloud models.
arXiv Detail & Related papers (2021-06-08T08:42:55Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.