Semantic-Enhanced Image Clustering
- URL: http://arxiv.org/abs/2208.09849v2
- Date: Sun, 9 Apr 2023 02:33:10 GMT
- Title: Semantic-Enhanced Image Clustering
- Authors: Shaotian Cai, Liping Qiu, Xiaojun Chen, Qin Zhang, Longteng Chen
- Abstract summary: We propose to investigate the task of image clustering with the help of a visual-language pre-training model.
How to map images to a proper semantic space and how to cluster images from both image and semantic spaces are two key problems.
We propose a method to map the given images to a proper semantic space first and efficient methods to generate pseudo-labels according to the relationships between images and semantics.
- Score: 6.218389227248297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image clustering is an important and open-challenging task in computer
vision. Although many methods have been proposed to solve the image clustering
task, they only explore images and uncover clusters according to the image
features, thus being unable to distinguish visually similar but semantically
different images. In this paper, we propose to investigate the task of image
clustering with the help of a visual-language pre-training model. Different
from the zero-shot setting, in which the class names are known, we only know
the number of clusters in this setting. Therefore, how to map images to a
proper semantic space and how to cluster images from both image and semantic
spaces are two key problems. To solve the above problems, we propose a novel
image clustering method guided by the visual-language pre-training model CLIP,
named \textbf{Semantic-Enhanced Image Clustering (SIC)}. In this new method, we
propose a method to map the given images to a proper semantic space first and
efficient methods to generate pseudo-labels according to the relationships
between images and semantics. Finally, we propose performing clustering with
consistency learning in both image space and semantic space, in a
self-supervised learning fashion. The theoretical result of convergence
analysis shows that our proposed method can converge at a sublinear speed.
Theoretical analysis of expectation risk also shows that we can reduce the
expected risk by improving neighborhood consistency, increasing prediction
confidence, or reducing neighborhood imbalance. Experimental results on five
benchmark datasets clearly show the superiority of our new method.
Related papers
- Dual-Level Cross-Modal Contrastive Clustering [4.083185193413678]
We propose a novel image clustering framwork, named Dual-level Cross-Modal Contrastive Clustering (DXMC)
external textual information is introduced for constructing a semantic space which is adopted to generate image-text pairs.
The image-text pairs are respectively sent to pre-trained image and text encoder to obtain image and text embeddings which subsquently are fed into four well-designed networks.
arXiv Detail & Related papers (2024-09-06T18:49:45Z) - Local Clustering for Lung Cancer Image Classification via Sparse Solution Technique [1.07793546088014]
We view images as the vertices in a weighted graph and the similarity between a pair of images as the edges in the graph.
Our approach is significantly more efficient and either favorable or equally effective compared with other state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-11T18:18:32Z) - Grid Jigsaw Representation with CLIP: A New Perspective on Image
Clustering [37.15595383168132]
Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer.
GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets.
Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
arXiv Detail & Related papers (2023-10-27T03:07:05Z) - What's in a Name? Beyond Class Indices for Image Recognition [28.02490526407716]
We propose a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information.
We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of candidate names.
Our method leads to a roughly 50% improvement over the baseline on ImageNet in the unsupervised setting.
arXiv Detail & Related papers (2023-04-05T11:01:23Z) - Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering.
Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - One-Shot Image Classification by Learning to Restore Prototypes [11.448423413463916]
One-shot image classification aims to train image classifiers over the dataset with only one image per category.
For one-shot learning, the existing metric learning approaches would suffer poor performance because the single training image may not be representative of the class.
We propose a simple yet effective regression model, denoted by RestoreNet, which learns a class transformation on the image feature to move the image closer to the class center in the feature space.
arXiv Detail & Related papers (2020-05-04T02:11:30Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.