C3: Cross-instance guided Contrastive Clustering
- URL: http://arxiv.org/abs/2211.07136v4
- Date: Fri, 1 Sep 2023 04:27:44 GMT
- Title: C3: Cross-instance guided Contrastive Clustering
- Authors: Mohammadreza Sadeghi, Hadi Hojjati, Narges Armanfard
- Abstract summary: Clustering is the task of gathering similar data samples into clusters without using any predefined labels.
We propose a novel contrastive clustering method, Cross-instance guided Contrastive Clustering (C3)
Our proposed method can outperform state-of-the-art algorithms on benchmark computer vision datasets.
- Score: 8.953252452851862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is the task of gathering similar data samples into clusters
without using any predefined labels. It has been widely studied in machine
learning literature, and recent advancements in deep learning have revived
interest in this field. Contrastive clustering (CC) models are a staple of deep
clustering in which positive and negative pairs of each data instance are
generated through data augmentation. CC models aim to learn a feature space
where instance-level and cluster-level representations of positive pairs are
grouped together. Despite improving the SOTA, these algorithms ignore the
cross-instance patterns, which carry essential information for improving
clustering performance. This increases the false-negative-pair rate of the
model while decreasing its true-positive-pair rate. In this paper, we propose a
novel contrastive clustering method, Cross-instance guided Contrastive
Clustering (C3), that considers the cross-sample relationships to increase the
number of positive pairs and mitigate the impact of false negative, noise, and
anomaly sample on the learned representation of data. In particular, we define
a new loss function that identifies similar instances using the instance-level
representation and encourages them to aggregate together. Moreover, we propose
a novel weighting method to select negative samples in a more efficient way.
Extensive experimental evaluations show that our proposed method can outperform
state-of-the-art algorithms on benchmark computer vision datasets: we improve
the clustering accuracy by 6.6%, 3.3%, 5.0%, 1.3% and 0.3% on CIFAR-10,
CIFAR-100, ImageNet-10, ImageNet-Dogs, and Tiny-ImageNet.
Related papers
- Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task.
We propose a co-training-based framework that encourages clustering consistency.
Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z) - Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods [4.680881326162484]
Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results.
We propose an approach to identify those images with similar semantic content and treat them as positive instances.
We run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches.
arXiv Detail & Related papers (2023-06-28T11:47:08Z) - CLC: Cluster Assignment via Contrastive Representation Learning [9.631532215759256]
We propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment.
We achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins.
arXiv Detail & Related papers (2023-06-08T07:15:13Z) - Cluster-guided Contrastive Graph Clustering Network [53.16233290797777]
We propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC)
We construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks.
To construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples.
arXiv Detail & Related papers (2023-01-03T13:42:38Z) - Exploring Non-Contrastive Representation Learning for Deep Clustering [23.546602131801205]
Non-contrastive representation learning for deep clustering, termed NCC, is based on BYOL, a representative method without negative examples.
NCC forms an embedding space where all clusters are well-separated and within-cluster examples are compact.
Experimental results on several clustering benchmark datasets including ImageNet-1K demonstrate that NCC outperforms the state-of-the-art methods by a significant margin.
arXiv Detail & Related papers (2021-11-23T12:21:53Z) - Learning Statistical Representation with Joint Deep Embedded Clustering [2.1267423178232407]
StatDEC is an unsupervised framework for joint statistical representation learning and clustering.
Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets.
arXiv Detail & Related papers (2021-09-11T09:26:52Z) - Neighborhood Contrastive Learning for Novel Class Discovery [79.14767688903028]
We build a new framework, named Neighborhood Contrastive Learning, to learn discriminative representations that are important to clustering performance.
We experimentally demonstrate that these two ingredients significantly contribute to clustering performance and lead our model to outperform state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-06-20T17:34:55Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.