PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point
Cloud Understanding
- URL: http://arxiv.org/abs/2211.12032v2
- Date: Wed, 23 Nov 2022 15:06:57 GMT
- Title: PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point
Cloud Understanding
- Authors: Honggu Zhou, Xiaogang Peng, Jiawei Mao, Zizhao Wu, Ming Zeng
- Abstract summary: Cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning.
PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences.
- Score: 0.875967561330372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Some self-supervised cross-modal learning approaches have recently
demonstrated the potential of image signals for enhancing point cloud
representation. However, it remains a question on how to directly model
cross-modal local and global correspondences in a self-supervised fashion. To
solve it, we proposed PointCMC, a novel cross-modal method to model multi-scale
correspondences across modalities for self-supervised point cloud
representation learning. In particular, PointCMC is composed of: (1) a
local-to-local (L2L) module that learns local correspondences through optimized
cross-modal local geometric features, (2) a local-to-global (L2G) module that
aims to learn the correspondences between local and global features across
modalities via local-global discrimination, and (3) a global-to-global (G2G)
module, which leverages auxiliary global contrastive loss between the point
cloud and image to learn high-level semantic correspondences. Extensive
experiment results show that our approach outperforms existing state-of-the-art
methods in various downstream tasks such as 3D object classification and
segmentation. Code will be made publicly available upon acceptance.
Related papers
- BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning [26.400567961735234]
Correspondence pruning aims to establish reliable correspondences between two related images.
Existing approaches often employ a progressive strategy to handle the local and global contexts.
We propose a parallel context learning strategy that involves acquiring bilateral consensus for the two-view correspondence pruning task.
arXiv Detail & Related papers (2024-01-07T11:38:15Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - 3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud
Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN)
The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z) - Multi-scale Network with Attentional Multi-resolution Fusion for Point
Cloud Semantic Segmentation [2.964101313270572]
We present a comprehensive point cloud semantic segmentation network that aggregates both local and global multi-scale information.
We introduce an Angle Correlation Point Convolution module to effectively learn the local shapes of points.
Third, inspired by HRNet which has excellent performance on 2D image vision tasks, we build an HRNet customized for point cloud to learn global multi-scale context.
arXiv Detail & Related papers (2022-06-27T21:03:33Z) - Cross-modal Local Shortest Path and Global Enhancement for
Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features.
The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis [56.91758845045371]
We propose a novel framework named Point Relation-Aware Network (PRA-Net)
It is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module.
Experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the ability of PRA-Net.
arXiv Detail & Related papers (2021-12-09T13:24:43Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - LRC-Net: Learning Discriminative Features on Point Clouds by Encoding
Local Region Contexts [65.79931333193016]
We present a novel Local-Region-Context Network (LRC-Net) to learn discriminative features on point clouds.
LRC-Net encodes fine-grained contexts inside and among local regions simultaneously.
Results show LRC-Net is competitive with state-of-the-art methods in shape classification and shape segmentation applications.
arXiv Detail & Related papers (2020-03-18T14:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.