PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition
- URL: http://arxiv.org/abs/2207.03128v4
- Date: Thu, 15 Jun 2023 06:21:09 GMT
- Title: PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition
- Authors: Qijian Zhang, Junhui Hou, Yue Qian
- Abstract summary: We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
- Score: 55.38462937452363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As two fundamental representation modalities of 3D objects, 3D point clouds
and multi-view 2D images record shape information from different domains of
geometric structures and visual appearances. In the current deep learning era,
remarkable progress in processing such two data modalities has been achieved
through respectively customizing compatible 3D and 2D network architectures.
However, unlike multi-view image-based 2D visual modeling paradigms, which have
shown leading performance in several common 3D shape recognition benchmarks,
point cloud-based 3D geometric modeling paradigms are still highly limited by
insufficient learning capacity, due to the difficulty of extracting
discriminative features from irregular geometric signals. In this paper, we
explore the possibility of boosting deep 3D point cloud encoders by
transferring visual knowledge extracted from deep 2D image encoders under a
standard teacher-student distillation workflow. Generally, we propose PointMCD,
a unified multi-view cross-modal distillation architecture, including a
pretrained deep image encoder as the teacher and a deep point encoder as the
student. To perform heterogeneous feature alignment between 2D visual and 3D
geometric domains, we further investigate visibility-aware feature projection
(VAFP), by which point-wise embeddings are reasonably aggregated into
view-specific geometric descriptors. By pair-wisely aligning multi-view visual
and geometric descriptors, we can obtain more powerful deep point encoders
without exhausting and complicated network modification. Experiments on 3D
shape classification, part segmentation, and unsupervised learning strongly
validate the effectiveness of our method. The code and data will be publicly
available at https://github.com/keeganhk/PointMCD.
Related papers
- Point Cloud Self-supervised Learning via 3D to Multi-view Masked
Autoencoder [21.73287941143304]
Multi-Modality Masked AutoEncoders (MAE) methods leverage both 2D images and 3D point clouds for pre-training.
We introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds.
Our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks.
arXiv Detail & Related papers (2023-11-17T22:10:03Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding [2.8661021832561757]
CrossPoint is a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.
Our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation.
arXiv Detail & Related papers (2022-03-01T18:59:01Z) - ParaNet: Deep Regular Representation for 3D Point Clouds [62.81379889095186]
ParaNet is a novel end-to-end deep learning framework for representing 3D point clouds.
It converts an irregular 3D point cloud into a regular 2D color image, named point geometry image (PGI)
In contrast to conventional regular representation modalities based on multi-view projection and voxelization, the proposed representation is differentiable and reversible.
arXiv Detail & Related papers (2020-12-05T13:19:55Z) - Improved Modeling of 3D Shapes with Multi-view Depth Maps [48.8309897766904]
We present a general-purpose framework for modeling 3D shapes using CNNs.
Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects.
arXiv Detail & Related papers (2020-09-07T17:58:27Z) - Self-supervised Feature Learning by Cross-modality and Cross-view
Correspondences [32.01548991331616]
This paper presents a novel self-supervised learning approach to learn both 2D image features and 3D point cloud features.
It exploits cross-modality and cross-view correspondences without using any annotated human labels.
The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks.
arXiv Detail & Related papers (2020-04-13T02:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.