Related papers: Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

URL: http://arxiv.org/abs/2306.05272v5
Date: Fri, 26 Apr 2024 14:10:49 GMT
Title: Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Authors: Tianzhe Chu, Shengbang Tong, Tianjiao Ding, Xili Dai, Benjamin David Haeffele, René Vidal, Yi Ma,
Abstract summary: We propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models. We show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k.
Score: 37.574691902971296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem, still lacks an effective solution, particularly for large-scale datasets. In this paper, we propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models such as CLIP and cluster images effectively and efficiently at scale. We first developed a novel algorithm to estimate the number of clusters in a given dataset. We then show that the pre-trained features are significantly more structured by further optimizing the rate reduction objective. The resulting features may significantly improve the clustering accuracy, e.g., from 57\% to 66\% on ImageNet-1k. Furthermore, by leveraging CLIP's multimodality bridge between image and text, we develop a simple yet effective self-labeling algorithm that produces meaningful captions for the clusters. Through extensive experiments, we show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k. It also extends to datasets that are not curated for clustering, such as LAION-Aesthetics and WikiArts. We released the code in https://github.com/LeslieTrue/CPP.

Related papers

Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency [57.961869351897384]
We propose a framework based on cross-modal semantic consistency for efficient image clustering.<n>Our framework first builds a strong foundation via Cross-Modal Semantic Consistency.<n>In the first stage, we train lightweight clustering heads to align with the rich semantics of the pre-trained model.<n>In the second stage, we introduce a Self-Enhanced fine-tuning strategy.
arXiv Detail & Related papers (2025-08-02T08:12:57Z)
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data. We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images. We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z)
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering [37.15595383168132]
Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets. Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
arXiv Detail & Related papers (2023-10-27T03:07:05Z)
Exploring the Limits of Deep Image Clustering using Pretrained Models [1.1060425537315088]
We present a methodology that learns to classify images without labels by leveraging pretrained feature extractors. We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting.
arXiv Detail & Related papers (2023-03-31T08:56:29Z)
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z)
Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets. The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse. Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Our method simultaneously clusters the data while enforcing consistency between cluster assignments. Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z)
SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled. A self-supervised task from representation learning is employed to obtain semantically meaningful features. We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering [9.722607434532883]
We propose a self-supervised clustering network for image Clustering (GATCluster) Rather than extracting intermediate features first and then performing the traditional clustering, GATCluster semantic cluster labels without further post-processing. We develop a two-step learning algorithm that is memory-efficient for clustering large-size images.
arXiv Detail & Related papers (2020-02-27T00:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.