Related papers: Organizing Unstructured Image Collections using Natural Language

Organizing Unstructured Image Collections using Natural Language

URL: http://arxiv.org/abs/2410.05217v2
Date: Mon, 14 Oct 2024 18:47:46 GMT
Title: Organizing Unstructured Image Collections using Natural Language
Authors: Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci,
Abstract summary: We introduce the task Semantic Multiple Clustering (SMC) that aims to automatically discover clustering criteria from large image collections. Our framework, Text Driven Semantic Multiple Clustering (TeDeSC), uses text as a proxy to concurrently reason over large image collections. We apply TeDeSC to various applications, such as discovering biases and analyzing social media image popularity.
Score: 37.16101036513514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Organizing unstructured visual data into semantic clusters is a key challenge in computer vision. Traditional deep clustering (DC) approaches focus on a single partition of data, while multiple clustering (MC) methods address this limitation by uncovering distinct clustering solutions. The rise of large language models (LLMs) and multimodal LLMs (MLLMs) has enhanced MC by allowing users to define clustering criteria in natural language. However, manually specifying criteria for large datasets is impractical. In this work, we introduce the task Semantic Multiple Clustering (SMC) that aims to automatically discover clustering criteria from large image collections, uncovering interpretable substructures without requiring human input. Our framework, Text Driven Semantic Multiple Clustering (TeDeSC), uses text as a proxy to concurrently reason over large image collections, discover partitioning criteria, expressed in natural language, and reveal semantic substructures. To evaluate TeDeSC, we introduce the COCO-4c and Food-4c benchmarks, each containing four grouping criteria and ground-truth annotations. We apply TeDeSC to various applications, such as discovering biases and analyzing social media image popularity, demonstrating its utility as a tool for automatically organizing image collections and revealing novel insights.

Related papers

ESMC: MLLM-Based Embedding Selection for Explainable Multiple Clustering [79.69917150582633]
Multi-modal large language models (MLLMs) can be leveraged to achieve user-driven clustering.<n>Our method first discovers that MLLMs' hidden states of text tokens are strongly related to the corresponding features.<n>We also employ a lightweight clustering head augmented with pseudo-label learning, significantly enhancing clustering accuracy.
arXiv Detail & Related papers (2025-11-30T04:36:51Z)
Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version) [50.41628860536753]
We propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax.<n>Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process.
arXiv Detail & Related papers (2025-11-12T11:17:17Z)
Interpretable Text-Guided Image Clustering via Iterative Search [22.032915471999864]
Distinct users may want to use different criteria to form clusters in the same data.<n>We propose a new text-guided clustering approach named ITGC that uses an iterative discovery process, guided by an unsupervised clustering objective.<n>We report superior performance compared to existing methods across a wide variety of image clustering and fine-grained classification benchmarks.
arXiv Detail & Related papers (2025-06-14T13:58:56Z)
Agent-Centric Personalized Multiple Clustering with Multi-Modal LLMs [40.38930402847949]
We propose an agent-centric personalized clustering framework. Agents traverse a relational graph to search for clusters based on user interests. Results show that the proposed method achieves NMI scores of 0.9667 and 0.9481 on the Card Order and Card Suits benchmarks.
arXiv Detail & Related papers (2025-03-28T08:45:15Z)
Personalized Clustering via Targeted Representation Learning [12.685373069492448]
Clustering traditionally aims to reveal a natural grouping structure within unlabeled data. We propose a personalized clustering method that explicitly performs targeted representation learning.
arXiv Detail & Related papers (2024-12-18T10:28:51Z)
Text-Guided Alternative Image Clustering [11.103514372355088]
This work explores the potential of large vision-language models to facilitate alternative image clustering. We propose Text-Guided Alternative Image Consensus Clustering (TGAICC), a novel approach that leverages user-specified interests via prompts. TGAICC outperforms image- and text-based baselines on four alternative image clustering benchmark datasets.
arXiv Detail & Related papers (2024-06-07T08:37:57Z)
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances [24.142013877384603]
This paper introduces a novel unsupervised multimodal clustering method (UMC), making a pioneering contribution to this field. UMC introduces a unique approach to constructing augmentation views for multimodal data, which are then used to perform pre-training. We show remarkable improvements of 2-6% scores in clustering metrics over state-of-the-art methods, marking the first successful endeavor in this domain.
arXiv Detail & Related papers (2024-05-21T13:24:07Z)
Vocabulary-free Image Classification and Semantic Segmentation [71.78089106671581]
We introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an un-constrained language-induced semantic space to an input image without needing a known vocabulary. VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories. We propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database.
arXiv Detail & Related papers (2024-04-16T19:27:21Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Image Clustering Conditioned on Text Criteria [14.704110575570166]
We present a new method for performing image clustering based on user-specified text criteria. We call our method Image Clustering Conditioned on Text Criteria (IC|TC) IC|TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return.
arXiv Detail & Related papers (2023-10-27T17:35:01Z)
Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Large Language Models Enable Few-Shot Clustering [88.06276828752553]
We show that large language models can amplify an expert's guidance to enable query-efficient, few-shot semi-supervised text clustering. We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality.
arXiv Detail & Related papers (2023-07-02T09:17:11Z)
CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z)
CEIL: A General Classification-Enhanced Iterative Learning Framework for Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering. In each iteration, we first adopt a language model to retrieve the initial text representations. After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information. Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z)
Deep Clustering: A Comprehensive Survey [53.387957674512585]
Clustering analysis plays an indispensable role in machine learning and data mining. Deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering.
arXiv Detail & Related papers (2022-10-09T02:31:32Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation. We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.