Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
- URL: http://arxiv.org/abs/2106.04180v1
- Date: Tue, 8 Jun 2021 08:42:55 GMT
- Title: Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets
- Authors: Chenfeng Xu, Shijia Yang, Bohan Zhai, Bichen Wu, Xiangyu Yue, Wei
Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka
- Abstract summary: We show that we can indeed use the same neural net model architectures to understand both images and point-clouds.
Specifically, based on a 2D ConvNet pretrained on an image dataset, we can transfer the image model to a point-cloud model by textitinflating 2D convolutional filters to 3D.
The transferred model can achieve competitive performance on 3D point-cloud classification, indoor and driving scene segmentation, even beating a wide range of point-cloud models.
- Score: 45.78834662125001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D point-clouds and 2D images are different visual representations of the
physical world. While human vision can understand both representations,
computer vision models designed for 2D image and 3D point-cloud understanding
are quite different. Our paper investigates the potential for transferability
between these two representations by empirically investigating whether this
approach works, what factors affect the transfer performance, and how to make
it work even better. We discovered that we can indeed use the same neural net
model architectures to understand both images and point-clouds. Moreover, we
can transfer pretrained weights from image models to point-cloud models with
minimal effort. Specifically, based on a 2D ConvNet pretrained on an image
dataset, we can transfer the image model to a point-cloud model by
\textit{inflating} 2D convolutional filters to 3D then finetuning its input,
output, and optionally normalization layers. The transferred model can achieve
competitive performance on 3D point-cloud classification, indoor and driving
scene segmentation, even beating a wide range of point-cloud models that adopt
task-specific architectures and use a variety of tricks.
Related papers
- Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers [38.08724410736292]
This paper attempts to leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis.
We propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds.
arXiv Detail & Related papers (2024-07-18T06:32:45Z) - HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network.
Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - Intrinsic Image Decomposition Using Point Cloud Representation [13.771632868567277]
We introduce Point Intrinsic Net (PoInt-Net), which leverages 3D point cloud data to concurrently estimate albedo and shading maps.
PoInt-Net is efficient, achieving consistent performance across point clouds of any size with training only required on small-scale point clouds.
arXiv Detail & Related papers (2023-07-20T14:51:28Z) - Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost
3D Point Cloud Data-scarce Learning? [38.06639044139636]
This work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification.
MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification.
arXiv Detail & Related papers (2023-04-20T11:39:41Z) - CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding [2.8661021832561757]
CrossPoint is a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.
Our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation.
arXiv Detail & Related papers (2022-03-01T18:59:01Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z) - From Image Collections to Point Clouds with Self-supervised Shape and
Pose Networks [53.71440550507745]
Reconstructing 3D models from 2D images is one of the fundamental problems in computer vision.
We propose a deep learning technique for 3D object reconstruction from a single image.
We learn both 3D point cloud reconstruction and pose estimation networks in a self-supervised manner.
arXiv Detail & Related papers (2020-05-05T04:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.