Related papers: Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering

URL: http://arxiv.org/abs/2310.17869v1
Date: Fri, 27 Oct 2023 03:07:05 GMT
Title: Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Authors: Zijie Song, Zhenzhen Hu and Richang Hong
Abstract summary: Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets. Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
Score: 37.15595383168132
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised representation learning for image clustering is essential in computer vision. Although the advancement of visual models has improved image clustering with efficient visual representations, challenges still remain. Firstly, these features often lack the ability to represent the internal structure of images, hindering the accurate clustering of visually similar images. Secondly, the existing features tend to lack finer-grained semantic labels, limiting the ability to capture nuanced differences and similarities between images. In this paper, we first introduce Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. We emphasize that this algorithm, which mimics human jigsaw puzzle, can effectively improve the model to distinguish the spatial feature between different samples and enhance the clustering ability. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets including CIFAR-10, CIFAR-100/20, STL-10, ImageNet-10 and ImageNetDog-15. On the other hand, convergence efficiency is always an important challenge for unsupervised image clustering. Recently, pretrained representation learning has made great progress and released models can extract mature visual representations. It is obvious that use the pretrained model as feature extractor can speed up the convergence of clustering where our aim is to provide new perspective in image clustering with reasonable resource application and provide new baseline. Further, we innovate pretrain-based Grid Jigsaw Representation (pGJR) with improvement by GJR. The experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.

Related papers

Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images [14.836487514037994]
Sparse and noisy images (SNIs) pose significant challenges for effective representation learning and clustering. We propose Dual Advancement of Representation Learning and Clustering (DARLC) to enhance the representations derived from masked image modeling. Our framework offers a comprehensive approach that improves the learning of representations by enhancing their local perceptibility, distinctiveness, and the understanding of relational semantics.
arXiv Detail & Related papers (2024-09-03T10:52:27Z)
Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis. We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data. FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z)
Superpixel Graph Contrastive Clustering with Semantic-Invariant Augmentations for Hyperspectral Images [64.72242126879503]
Hyperspectral images (HSI) clustering is an important but challenging task. We first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spatial and spectral features of HSI. We then design a superpixel graph contrastive clustering model to learn discriminative superpixel representations.
arXiv Detail & Related papers (2024-03-04T07:40:55Z)
ClusterFormer: Clustering As A Universal Visual Learner [80.79669078819562]
CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER. It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
arXiv Detail & Related papers (2023-09-22T22:12:30Z)
Masked Contrastive Graph Representation Learning for Age Estimation [44.96502862249276]
This paper utilizes the property of graph representation learning in dealing with image redundancy information. We propose a novel Masked Contrastive Graph Representation Learning (MCGRL) method for age estimation. Experimental results on real-world face image datasets demonstrate the superiority of our proposed method over other state-of-the-art age estimation approaches.
arXiv Detail & Related papers (2023-06-16T15:53:21Z)
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models [37.574691902971296]
We propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models. We show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k.
arXiv Detail & Related papers (2023-06-08T15:20:27Z)
ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z)
Semantic-Enhanced Image Clustering [6.218389227248297]
We propose to investigate the task of image clustering with the help of a visual-language pre-training model. How to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems. We propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics.
arXiv Detail & Related papers (2022-08-21T09:04:21Z)
Deep Image Clustering with Contrastive Learning and Multi-scale Graph Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN) Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z)
Vision Transformer for Contrastive Clustering [48.476602271481674]
Vision Transformer (ViT) has shown its advantages over the convolutional neural network (CNN) This paper presents an end-to-end deep image clustering approach termed Vision Transformer for Contrastive Clustering (VTCC)
arXiv Detail & Related papers (2022-06-26T17:00:35Z)
Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z)
Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length. It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity. Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling [0.8164433158925593]
In computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data. In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function, it is beneficial to not have images of the same category in the same batch. We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels.
arXiv Detail & Related papers (2020-09-25T02:25:37Z)
Deep Transformation-Invariant Clustering [24.23117820167443]
We present an approach that does not rely on abstract features but instead learns to predict image transformations. This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model. We demonstrate that our novel approach yields competitive and highly promising results on standard image clustering benchmarks.
arXiv Detail & Related papers (2020-06-19T13:43:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.