Contrastive Multi-Modal Clustering
- URL: http://arxiv.org/abs/2106.11193v1
- Date: Mon, 21 Jun 2021 15:32:34 GMT
- Title: Contrastive Multi-Modal Clustering
- Authors: Jie Xu, Huayi Tang, Yazhou Ren, Xiaofeng Zhu, Lifang He
- Abstract summary: We propose Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic information via contrastive learning.
CMMC has good scalability and outperforms state-of-the-art multi-modal clustering methods.
- Score: 22.117014300127423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal clustering, which explores complementary information from
multiple modalities or views, has attracted people's increasing attentions.
However, existing works rarely focus on extracting high-level semantic
information of multiple modalities for clustering. In this paper, we propose
Contrastive Multi-Modal Clustering (CMMC) which can mine high-level semantic
information via contrastive learning. Concretely, our framework consists of
three parts. (1) Multiple autoencoders are optimized to maintain each
modality's diversity to learn complementary information. (2) A feature
contrastive module is proposed to learn common high-level semantic features
from different modalities. (3) A label contrastive module aims to learn
consistent cluster assignments for all modalities. By the proposed multi-modal
contrastive learning, the mutual information of high-level features is
maximized, while the diversity of the low-level latent features is maintained.
In addition, to utilize the learned high-level semantic features, we further
generate pseudo labels by solving a maximum matching problem to fine-tune the
cluster assignments. Extensive experiments demonstrate that CMMC has good
scalability and outperforms state-of-the-art multi-modal clustering methods.
Related papers
- U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Towards Generalized Multi-stage Clustering: Multi-view Self-distillation [10.368796552760571]
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task.
This paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution.
arXiv Detail & Related papers (2023-10-29T03:35:34Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - One-step Multi-view Clustering with Diverse Representation [47.41455937479201]
We propose a one-step multi-view clustering with diverse representation method, which incorporates multi-view learning and $k$-means into a unified framework.
We develop an efficient optimization algorithm with proven convergence to solve the resultant problem.
arXiv Detail & Related papers (2023-06-08T02:52:24Z) - Multi-view Semantic Consistency based Information Bottleneck for
Clustering [13.589996737740208]
We introduce a novel Multi-view Semantic Consistency based Information Bottleneck for clustering (MSCIB)
MSCIB pursues semantic consistency to improve the learning process of information bottleneck for different views.
It conducts the alignment operation of multiple views in the semantic space and jointly achieves the valuable consistent information of multi-view data.
arXiv Detail & Related papers (2023-02-28T02:01:58Z) - MCoCo: Multi-level Consistency Collaborative Multi-view Clustering [15.743056561394612]
Multi-view clustering can explore consistent information from different views to guide clustering.
We propose a novel Multi-level Consistency Collaborative learning framework (MCoCo) for multi-view clustering.
arXiv Detail & Related papers (2023-02-26T16:08:53Z) - Dual Information Enhanced Multi-view Attributed Graph Clustering [11.624319530337038]
A novel Dual Information enhanced multi-view Attributed Graph Clustering (DIAGC) method is proposed in this paper.
The proposed method introduces the Specific Information Reconstruction (SIR) module to disentangle the explorations of the consensus and specific information from multiple views.
The Mutual Information Maximization (MIM) module maximizes the agreement between the latent high-level representation and low-level ones, and enables the high-level representation to satisfy the desired clustering structure.
arXiv Detail & Related papers (2022-11-28T01:18:04Z) - Deep Attention-guided Graph Clustering with Dual Self-supervision [49.040136530379094]
We propose a novel method, namely deep attention-guided graph clustering with dual self-supervision (DAGC)
We develop a dual self-supervision solution consisting of a soft self-supervision strategy with a triplet Kullback-Leibler divergence loss and a hard self-supervision strategy with a pseudo supervision loss.
Our method consistently outperforms state-of-the-art methods on six benchmark datasets.
arXiv Detail & Related papers (2021-11-10T06:53:03Z) - Unsupervised Person Re-Identification with Multi-Label Learning Guided
Self-Paced Clustering [48.31017226618255]
Unsupervised person re-identification (Re-ID) has drawn increasing research attention recently.
In this paper, we address the unsupervised person Re-ID with a conceptually novel yet simple framework, termed as Multi-label Learning guided self-paced Clustering (MLC)
MLC mainly learns discriminative features with three crucial modules, namely a multi-scale network, a multi-label learning module, and a self-paced clustering module.
arXiv Detail & Related papers (2021-03-08T07:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.