Related papers: PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point Cloud Understanding

PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point Cloud Understanding

URL: http://arxiv.org/abs/2211.12032v2
Date: Wed, 23 Nov 2022 15:06:57 GMT
Title: PointCMC: Cross-Modal Multi-Scale Correspondences Learning for Point Cloud Understanding
Authors: Honggu Zhou, Xiaogang Peng, Jiawei Mao, Zizhao Wu, Ming Zeng
Abstract summary: Cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning. PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences.
Score: 0.875967561330372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Some self-supervised cross-modal learning approaches have recently demonstrated the potential of image signals for enhancing point cloud representation. However, it remains a question on how to directly model cross-modal local and global correspondences in a self-supervised fashion. To solve it, we proposed PointCMC, a novel cross-modal method to model multi-scale correspondences across modalities for self-supervised point cloud representation learning. In particular, PointCMC is composed of: (1) a local-to-local (L2L) module that learns local correspondences through optimized cross-modal local geometric features, (2) a local-to-global (L2G) module that aims to learn the correspondences between local and global features across modalities via local-global discrimination, and (3) a global-to-global (G2G) module, which leverages auxiliary global contrastive loss between the point cloud and image to learn high-level semantic correspondences. Extensive experiment results show that our approach outperforms existing state-of-the-art methods in various downstream tasks such as 3D object classification and segmentation. Code will be made publicly available upon acceptance.

Related papers

Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning. Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning. We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z)
BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning [26.400567961735234]
Correspondence pruning aims to establish reliable correspondences between two related images. Existing approaches often employ a progressive strategy to handle the local and global contexts. We propose a parallel context learning strategy that involves acquiring bilateral consensus for the two-view correspondence pruning task.
arXiv Detail & Related papers (2024-01-07T11:38:15Z)
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality. Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z)
3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation [21.054928631088575]
This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN) The proposed framework is evaluated on both classification and segmentation datasets.
arXiv Detail & Related papers (2022-09-21T14:34:21Z)
Multi-scale Network with Attentional Multi-resolution Fusion for Point Cloud Semantic Segmentation [2.964101313270572]
We present a comprehensive point cloud semantic segmentation network that aggregates both local and global multi-scale information. We introduce an Angle Correlation Point Convolution module to effectively learn the local shapes of points. Third, inspired by HRNet which has excellent performance on 2D image vision tasks, we build an HRNet customized for point cloud to learn global multi-scale context.
arXiv Detail & Related papers (2022-06-27T21:03:33Z)
Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification [2.294635424666456]
We propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features. The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods.
arXiv Detail & Related papers (2022-06-09T10:27:22Z)
Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images. We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM) The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z)
PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis [56.91758845045371]
We propose a novel framework named Point Relation-Aware Network (PRA-Net) It is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module. Experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the ability of PRA-Net.
arXiv Detail & Related papers (2021-12-09T13:24:43Z)
Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks. Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z)
Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision. We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z)
LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts [65.79931333193016]
We present a novel Local-Region-Context Network (LRC-Net) to learn discriminative features on point clouds. LRC-Net encodes fine-grained contexts inside and among local regions simultaneously. Results show LRC-Net is competitive with state-of-the-art methods in shape classification and shape segmentation applications.
arXiv Detail & Related papers (2020-03-18T14:34:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.