IOCC: Aligning Semantic and Cluster Centers for Few-shot Short Text Clustering
- URL: http://arxiv.org/abs/2508.06126v1
- Date: Fri, 08 Aug 2025 08:47:13 GMT
- Title: IOCC: Aligning Semantic and Cluster Centers for Few-shot Short Text Clustering
- Authors: Jixuan Yin, Zhihao Yao, Wenshuai Huo, Xinmiao Yu, Xiaocheng Feng, Bo Li,
- Abstract summary: In clustering tasks, it is essential to structure the feature space into clear, well-separated distributions.<n>We propose IOCC, a novel few-shot contrastive learning method that achieves alignment between the cluster centers and semantic centers.<n>IOCC outperforms previous methods, achieving up to 7.34% improvement on challenging Biomedical dataset.
- Score: 15.657808381423736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In clustering tasks, it is essential to structure the feature space into clear, well-separated distributions. However, because short text representations have limited expressiveness, conventional methods struggle to identify cluster centers that truly capture each category's underlying semantics, causing the representations to be optimized in suboptimal directions. To address this issue, we propose IOCC, a novel few-shot contrastive learning method that achieves alignment between the cluster centers and the semantic centers. IOCC consists of two key modules: Interaction-enhanced Optimal Transport (IEOT) and Center-aware Contrastive Learning (CACL). Specifically, IEOT incorporates semantic interactions between individual samples into the conventional optimal transport problem, and generate pseudo-labels. Based on these pseudo-labels, we aggregate high-confidence samples to construct pseudo-centers that approximate the semantic centers. Next, CACL optimizes text representations toward their corresponding pseudo-centers. As training progresses, the collaboration between the two modules gradually reduces the gap between cluster centers and semantic centers. Therefore, the model will learn a high-quality distribution, improving clustering performance. Extensive experiments on eight benchmark datasets show that IOCC outperforms previous methods, achieving up to 7.34\% improvement on challenging Biomedical dataset and also excelling in clustering stability and efficiency. The code is available at: https://anonymous.4open.science/r/IOCC-C438.
Related papers
- You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering [73.48306836608124]
DCBoost is a parameter-free plug-in designed to enhance the global feature structures of current deep clustering models.<n>By harnessing reliable local structural cues, our method aims to elevate clustering performance effectively.
arXiv Detail & Related papers (2025-11-26T09:16:36Z) - Self-Enhanced Image Clustering with Cross-Modal Semantic Consistency [57.961869351897384]
We propose a framework based on cross-modal semantic consistency for efficient image clustering.<n>Our framework first builds a strong foundation via Cross-Modal Semantic Consistency.<n>In the first stage, we train lightweight clustering heads to align with the rich semantics of the pre-trained model.<n>In the second stage, we introduce a Self-Enhanced fine-tuning strategy.
arXiv Detail & Related papers (2025-08-02T08:12:57Z) - A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering [0.782496834711349]
This research aims to develop a dynamic and scalable framework to facilitate harmonization of Common Data Elements (CDEs) across heterogeneous biomedical datasets.<n>Our methodology leverages Large Language Models (LLMs) for context-aware text embeddings that convert CDEs into dense vectors capturing semantic relationships and patterns.
arXiv Detail & Related papers (2025-06-02T18:43:37Z) - Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning [65.75756724642932]
In incomplete multi-view clustering, missing data induce prototype shifts within views and semantic inconsistencies across views.<n>We propose an IMVC framework, imputation- and alignment-free for consensus semantics learning (FreeCSL)<n>FreeCSL achieves more confident and robust assignments on IMVC task, compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-05-16T12:37:10Z) - How to optimize K-means? [8.206124331448931]
Center-based clustering algorithms (e.g., K-means) are popular for clustering tasks, but they usually struggle to achieve high accuracy on complex datasets.<n>We believe the main reason is that traditional center-based clustering algorithms identify only one clustering center in each cluster.<n>We propose a general optimization method called ECAC, and it can optimize different center-based clustering algorithms.
arXiv Detail & Related papers (2025-03-25T03:37:52Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named Semantic Equitable Clustering (SEC)<n>SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.<n>We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - End-to-end Learnable Clustering for Intent Learning in Recommendation [54.157784572994316]
We propose a novel intent learning method termed underlineELCRec.
It unifies behavior representation learning into an underlineEnd-to-end underlineLearnable underlineClustering framework.
We deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
arXiv Detail & Related papers (2024-01-11T15:22:55Z) - Very Compact Clusters with Structural Regularization via Similarity and
Connectivity [3.779514860341336]
We propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC) for the general datasets.
Our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods.
arXiv Detail & Related papers (2021-06-09T23:22:03Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - A Classification-Based Approach to Semi-Supervised Clustering with
Pairwise Constraints [5.639904484784126]
We introduce a network framework for semi-supervised clustering with pairwise constraints.
In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages.
The proposed approach, S3C2, is motivated by the observation that binary classification is usually easier than multi-class clustering.
arXiv Detail & Related papers (2020-01-18T20:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.