CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding
- URL: http://arxiv.org/abs/2203.00680v2
- Date: Wed, 2 Mar 2022 12:36:47 GMT
- Title: CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
Point Cloud Understanding
- Authors: Mohamed Afham, Isuru Dissanayake, Dinithi Dissanayake, Amaya
Dharmasiri, Kanchana Thilakarathna, Ranga Rodrigo
- Abstract summary: CrossPoint is a simple cross-modal contrastive learning approach to learn transferable 3D point cloud representations.
Our approach outperforms the previous unsupervised learning methods on a diverse range of downstream tasks including 3D object classification and segmentation.
- Score: 2.8661021832561757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manual annotation of large-scale point cloud dataset for varying tasks such
as 3D object classification, segmentation and detection is often laborious
owing to the irregular structure of point clouds. Self-supervised learning,
which operates without any human labeling, is a promising approach to address
this issue. We observe in the real world that humans are capable of mapping the
visual concepts learnt from 2D images to understand the 3D world. Encouraged by
this insight, we propose CrossPoint, a simple cross-modal contrastive learning
approach to learn transferable 3D point cloud representations. It enables a
3D-2D correspondence of objects by maximizing agreement between point clouds
and the corresponding rendered 2D image in the invariant space, while
encouraging invariance to transformations in the point cloud modality. Our
joint training objective combines the feature correspondences within and across
modalities, thus ensembles a rich learning signal from both 3D point cloud and
2D image modalities in a self-supervised fashion. Experimental results show
that our approach outperforms the previous unsupervised learning methods on a
diverse range of downstream tasks including 3D object classification and
segmentation. Further, the ablation studies validate the potency of our
approach for a better point cloud understanding. Code and pretrained models are
available at http://github.com/MohamedAfham/CrossPoint.
Related papers
- Pic@Point: Cross-Modal Learning by Local and Global Point-Picture Correspondence [0.0]
We present Pic@Point, an effective contrastive learning method based on structural 2D-3D correspondences.
We leverage image cues rich in semantic and contextual knowledge to provide a guiding signal for point cloud representations.
arXiv Detail & Related papers (2024-10-12T12:43:41Z) - Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Point Cloud Self-supervised Learning via 3D to Multi-view Masked
Autoencoder [21.73287941143304]
Multi-Modality Masked AutoEncoders (MAE) methods leverage both 2D images and 3D point clouds for pre-training.
We introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds.
Our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks.
arXiv Detail & Related papers (2023-11-17T22:10:03Z) - Cross-Modal Information-Guided Network using Contrastive Learning for
Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration.
We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism.
We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for
Spatial-Aware Visual Representations [85.38562724999898]
We propose a 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU.
Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module and an inter-modal feature interaction module.
To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets.
arXiv Detail & Related papers (2021-12-09T03:27:00Z) - Unsupervised Learning of Fine Structure Generation for 3D Point Clouds
by 2D Projection Matching [66.98712589559028]
We propose an unsupervised approach for 3D point cloud generation with fine structures.
Our method can recover fine 3D structures from 2D silhouette images at different resolutions.
arXiv Detail & Related papers (2021-08-08T22:15:31Z) - Point Discriminative Learning for Unsupervised Representation Learning
on 3D Point Clouds [54.31515001741987]
We propose a point discriminative learning method for unsupervised representation learning on 3D point clouds.
We achieve this by imposing a novel point discrimination loss on the middle level and global level point features.
Our method learns powerful representations and achieves new state-of-the-art performance.
arXiv Detail & Related papers (2021-08-04T15:11:48Z) - Self-supervised Feature Learning by Cross-modality and Cross-view
Correspondences [32.01548991331616]
This paper presents a novel self-supervised learning approach to learn both 2D image features and 3D point cloud features.
It exploits cross-modality and cross-view correspondences without using any annotated human labels.
The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks.
arXiv Detail & Related papers (2020-04-13T02:57:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.